What is a PDB file?
When it comes to computational chemistry, file formats are extremely important.
They are essential tools that give researchers the possibility to conserve complicated molecular structures in files that you can easily transfer to other machines or share with colleagues.
If you are a scientist or a student working in the field of computational chemistry or structural biology, you might have heard of a particular format generally referred to as “pdb file”. But what is it?
In this blog post, we will dive into the details of what a PDB file is, where you can find it, how it is formatted, and finally how you can open and visualize it.
All kinds of information you should be familiar with to successfully carry out your research in chemistry or biology.
Where to find PDB files
A PDB file, or Protein Data Bank file, is a file format used to store information about the 3D structure of biological macromolecules such as proteins, and nucleic acids.
The file format is widely used in computational chemistry because it is easy to read and understand, and it can be used with many different software programs.
You can recognize it from its peculiar .pdb
extension, and it is considered the standard way to store and transfer information about proteins and nucleic acids.
In many cases, you will receive PDB files as output from different programs so it would be better to be familiar with them.
However, the primary source of PDB files is the Protein Data Bank (PDB), a public database where researchers can deposit their experimentally determined structures of biological macromolecules.
I will not dive into the detail of how the databank works. I just want you to know that this is the place where you can find and download the experimentally determined structures of interest, and that each one of them has a unique four digits code.
Protein Data Bank file format
The PDB file format contains a lot of structural information such as the name and coordinates of each atom, the corresponding residues, and much more. In some cases, PDB files may also include metadata such as the authors of the original research, and the experimental method used to obtain the structure.
When you open a PDB file (you can do it with any text editor) you will find that it is composed of different lines, where each line is referred to as a record. Different types of records are available and they contain different information about the system.
Three of them are the important ones that you should know, as they are the one containing info about atoms in your system:
- The
ATOM
record: info about standard amino acids - The
HETATM
record: info about non-standard residues e.g., ligands - The
TER
record: signals the end of a chain of residues.
Let’s analyze more in-depth how atoms in a typical PDB file are formatted.
Formatting a PDB file
Here is a snippet of a file reporting a certain residue of a protein (Tyrosine 36).
|
|
Each line in the PDB file represents a single atom in the molecule, and the entries on each line provide information about the atom’s properties.
You may have noticed that the file looks well organized. Note that this is not a case
Each entry needs to be exactly in a specified range otherwise the PDB file will not be read correctly by molecular visualization software.
If you are not able to visualize your PDB file correctly always double-check that the file is properly formatted.
Let’s break down the entry in the first line:
|
|
Columns | Data |
---|---|
1-4 | The first entry is the record name, which is always ATOM for atoms in the molecule. |
7-11 | Atom serial number, which is a unique identifier for each atom in the molecule (21 ). |
13-16 | Atom name, which identifies the type of atom. In this case, it is N , which stands for nitrogen. |
17 | Alternate location indicator, missing in this case |
18-20 | Residue name, which identifies the amino acid residue to which the atom belongs. In this case, it is TYR for tyrosine. |
22 | Chain identifier, which identifies the chain to which the atom belongs. In this case, it is chain A . |
23-26 | Residue sequence number, which identifies the position of the amino acid residue in the chain. In this case, it is residue number 36 . |
31-38 | x coordinate of the atom in the three-dimensional space (50.550 Å). |
39-46 | y coordinate of the atom in the three-dimensional space (51.010 Å). |
47-54 | z coordinate of the atom in the three-dimensional space (47.480 Å). |
55-60 | Occupancy, which indicates the fraction of unit cells that contain the atom. In this case, it is 1.00 . |
61-66 | Temperature factor or B-factor, which indicates the mobility of the atom. In this example, it is 25.00 . |
73-76 | Segment identifies, missing in this case. |
77-78 | Element (N ). |
79-80 | Charge, missing in this case. |
Similar rules apply for all the other atoms record types (HETATM
, SER
). You can find more info on formatting and common errors here
Open a PDB file
What if you want to open a PDB file? There are two ways to go about it. Let’s dive into both of them.
Open PDB with a text editor
The first option is to open them with a good old text editor (less
, nano
, vi
, …). This can be useful for researchers who want to manipulate the data in the file or extract specific pieces of information.
However, it is important to note that PDB files can be quite large and complex, and it is quite easy to mess everything up if you don’t know what you are doing. That’s why it may be difficult to work with them in a text editor.
In addition to this, opening the file with a simple text editor does not give you a visual representation of the structure making it difficult to get an idea of what is exactly going on in the protein.
Why settle for text files when you can see your molecule in its entirety?
Open PDB with molecular visualization software
The most common use of a PDB file is for molecular visualization. Scientists can use specialized software to generate 3D models of the molecule based on the information contained in the PDB file.
That’s why, unless you have some very specific needs, it is probably better to use the second option, open a PDB file with a molecular visualization program.
You can achieve this using a variety of software packages, such as PyMOL (following the procedure we discussed here), VMD, or Chimera. These programs allow users to create a visual representation of the pdb text file so that you can explore the structure of the macromolecule in 3D, manipulate it, and analyze its properties.
These tools can help you gain insights into the structural and functional features of the macromolecule and also create visual representations to help you share your research with the rest of the scientific community.
PyMOL, for instance, offers a wide range of tools that allow you to do pretty much anything you want with your molecule.
If you want to read further, here are some useful articles showing how you can play with your PDB file using PyMOL: