GROMACS File Formats: Understanding topology, itp, and gro files
GROMACS, a widely used molecular dynamics simulation software, relies on specific file formats to communicate with the user and define the properties of molecules.
Understanding these file formats is crucial for setting up and analyzing simulations accurately.
In this blog post, we will explore three essential file formats in GROMACS and provide practical tips for working with them.
gro
File Format
The gro file format is a plain text file storing spatial coordinates and velocities (if available) of atoms during a molecular dynamics simulation. It follows a specific format that is crucial to understand if you plan to work with GROMACS.
A typical file starts with two lines like this:
|
|
The first line is a simple Title entry, which is automatically generated by GROMACS when the file is created using the gmx trjconv
command. if that is the case, the line contains information about the time and step of the simulation. The t=
entry specifies the time in picoseconds (ps), while the step=
entry specifies the step number.
The second line specifies the number of atoms in the system, which is a crucial parameter for performing various calculations and analysis tasks. The number of atoms is an integer value and should match the number of atoms in the system.
The rest of the file works much like a pdb file, each line in the gro file corresponds to an atom in the system and contains several columns with different information.
Here is an example of how you could find the simplest amino acid (Glycine) written in a gro file:
|
|
Let’s take the first row and break it down to see what each component means:
|
|
-
Residue number (5 positions, integer): Specifies the residue number (
243
) to which the atom belongs. It is an integer value with 5 positions, indicating the sequential order of the residue in the molecule. -
Residue name (5 positions, characters): This column contains the name of the residue to which the atom belongs. It is a 5-character string that represents the type of residue,
GLY
in our example. -
Atom name (5 positions, characters): This column contains the name of the atom. It is a 5-character string that represents the type of atom, such as
CA
for alpha carbon,N
for nitrogen, and so on. -
Atom number (5 positions, integer): This column specifies the atom number, which is a unique identifier for each atom in the system. It is an integer value with 5 positions, indicating the sequential order of the atom in the molecule.
-
Position (in nm, x y z in 3 columns, every 8 positions with 3 decimal places): This column contains the x, y, and z coordinates of the atom in nanometers (nm). The coordinates are listed in three columns with each column having 8 positions and 3 decimal places, allowing for high precision.
-
Velocity (in nm/ps, x y z in 3 columns, every 8 positions with 4 decimal places): This column contains the velocity of the atom in nanometers per picosecond (nm/ps) or kilometers per second (km/s). It also includes the x, y, and z components of the velocity, listed in three columns with each column having 8 positions and 4 decimal places, allowing for high precision. If velocities are not available, this column can be omitted from the file.
The last line of a gro file contains information about the size of the simulation box. The line contains three numbers, which represent the size of the box in nanometers (nm) in the x, y, and z directions, respectively.
For example, a line that looks like this:
|
|
specifies a simulation box with a length of 10.37454 nm in the x and y directions and a height of 15.63914 nm in the z-direction.
GROMACS Topology file (top
)
If you have already some experience with molecular simulation for sure you will have heard about the topology of a system. But what is that exactly?
You can think of the topology file as the molecular equivalent of a resume. It contains all the important information about the system you’re studying.
Explained in more rigorous terms, the topology file is where you define the parameters for how the atoms in your molecule interact with each other. That includes bonded interactions and non-bonded interactions, but also constraints or exclusions.
So, you can see that it is an essential component of any molecular simulation as it defines the interactions between atoms, which ultimately dictate the motion of the system under study.
In GROMACS the topology file is a simple text file characterized by the top
extension (generally topol.top
) which can be created with the gmx pdb2gmx
command.
The next logical question that arises is where these parameters come from.
If you carefully went through my blog the answer should be quite straightforward. They come from the force field. That’s why after you launch the pdb2gmx
command you are required to select one so that GROMACS can retrieve the corresponding parameters that will be used in your simulation.
Topology file formatting
Now let’s have a look at what a typical topol.top
file may look like. To inspect the contents of the file you can simply open it with a plain text editor (vi
, nano
, …). You can open the box code below to look at a sample topology file.
|
|
The file generally starts with several lines preceded by a semicolon ;
which are general comments.
After the comments, you’ll see the line that calls the parameters within the force field you selected (amber99sb
). This line indicates that all subsequent parameters are derived from this force field.
|
|
The next important line is [ moleculetype ]
which defines the name and exclusions of the molecules. In the given example, the molecule is named Protein
and has nrexcl 3
, that is, excluding non-bonded interactions between atoms that are no further than 3 bonds away.
|
|
The [ atoms ]
section lists all of the atoms in the protein, with the information presented in columns. Each row corresponds to a different atom in the protein, with details such as the atom number, type, residue number, residue name, atom name, and charge. In the example is reported a Glycine residue.
|
|
Following that you have other sections specifying other interactions such as [bonds]
, [pairs]
,[angles]
, and [dihedrals]
parameters.
The remaining sections of topol.top
define other useful/necessary topologies. For example, the posre.itp
file defines a force constant used to keep atoms in place during the equilibration phase.
|
|
Finally, the [ system ]
directive gives the name of the system that will be written to output files during the simulation, while the [ molecules ]
directive lists all of the molecules in the system.
|
|
It’s crucial to ensure that the order and names of the molecules listed in the [ molecules ]
directive exactly match those in the coordinate file (i.e., the gro file).
For instance, if your gro file contains a protein (Protein
), followed by a ligand (LIG
), and a cholesterol membrane (CHL
) composed of X
molecules, then the [ molecules ]
directive should be as follows:
|
|
Even a slight mismatch in the order or names of molecules between the [ molecules ]
directive and the gro file will result in an error.
Also, make sure that the names listed match the [ moleculetype ]
names otherwise you will receive errors concerning atom types not matching.
GROMACS itp
file
If you looked closely at the previous files, you may be wondering, Why exactly are the itp
files passed via the include
statement?
In molecular dynamics simulations, complex systems often involve a large number of molecules with various properties and interactions.
Specifying all the necessary parameters for such systems in a single file can quickly turn into something difficult to manage. For this reason, it’s considered more practical to use the include
mechanism to add parameters/moleculetypes using itp
files.
Therefore, an itp
(which stands for Include Topology) file is simply another text file that contains molecular topology information, such as bond lengths, bond angles, dihedral angles, and force constants, for a specific molecule or group of molecules. These files can then be included in the main topology file using the #include
directive.
Suppose you have a complex system with a protein, ligands, and various lipids that make up your membrane. It becomes apparent that if you attempt to incorporate all of the parameters we previously showed into a single file, it will rapidly become disorganized.
So you may encounter a topology where the parameters for different components are grouped into different itp
files:
|
|
You can immediately see that the use of the include statements is useful for making the topology compact, rather than writing out all parameters explicitly. As a result, you will get a much cleaner topology file.
Other files in GROMACS
In conclusion, understanding the topology, itp, and gro files is crucial for setting up and running molecular dynamics simulations using GROMACS. However, there are other file formats you need to be comfortable with for various purposes.
I link you to a series of posts where I discuss them in more detail:
- index file
ndx
file: used to group atoms so that you can use them for various analysis mdp
parameters file: to set up the parameters of your simulationxvg
file: GROMACS gives you this format as output of many analyses.pdb
file formatxtc
andtrr
files: files where the trajectories are stored