How to manage a trajectory file in GROMACS
The gmx trjconv command
In the intricate world of GROMACS simulations, the proper management of trajectory files is particularly important.
These files serve as the backbone of molecular dynamics investigations, containing the dynamic behavior of atoms and molecules over time.
In this article, I will start by providing a brief overview of what is a trajectory file and which ones are available in GROMACS.
Then, I will focus on two fundamental commands to handle them:
- The
gmx trjconv
command - The
gmx trjcat
command
Trajectory files in GROMACS: xtc and trr file formats
Let’s start by explaining what are trajectory files and which formats you will encounter in GROMACS.
By now, you should know that molecular simulations are obtained by solving the equations of motion for your system.
At each time step, the equations are solved and you will receive new coordinates, velocities, and forces for each atom. This information is conveniently stored in trajectory files.
Therefore, a trajectory file is just a collection of snapshots capturing the coordinates of atoms at each time step of your simulation.
When it comes to GROMACS simulations, trajectories are stored in two different formats, each serving specific purposes. Understanding these formats is crucial for efficient data management and subsequent analysis.
The two file formats I am referring to are:
-
The xtc file: this is probably the most commonly used one. Here the trajectories are stored using a reduced precision algorithm, making the files much less memory-consuming and therefore more portable.
-
The trr file: Here the files include coordinates, velocities, forces, and energies. This makes them larger in size due to their higher level of detail, and are only required for some very specific analysis.
You will receive these files as outputs from your simulations (namely from the gmx mdrun
command), and you can customize them as you wish by setting up the correct parameters in the mdp file.
Furthermore, If you want to visualize them, you should read this article where I explained how to do that using VMD.
Now let’s see how you can play with these files.
The gmx trjconv command: Different use cases
The first command we are going to see is gmx trjconv
, which will most likely be one of your best friends during your GROMACS journey.
There are countless different ways in which you can use this command. Here I just provide you with a list of the ones I generally use more often.
Reduce the dimensions of an xtc trajectory file
The first application we are going to see for this command is the reduction of the dimensions of a xtc trajectory file.
Sometimes you may generate a trajectory file that is too heavy (in the order of hundreds of GB).
You can imagine that handling these types of files is not really practical. From time to time you may want to transfer the trajectories from one workstation to another, or to your local laptop.
In such cases, it is useful to “lighten” the trajectory and make it less memory-consuming to facilitate data transfer.
What we can do is create an additional xtc file with a lower number of frames.
So we can use the command:
|
|
- Call the program with
gmx
- Select the
trjconv
command - Select the
-f
flag and provide the starting trajectory (system.xtc
) - Choose the
-s
flag and enter the .tpr file - The
-dt
flag allows us to reduce the number of frames in the output. In our case, we will write one frame every 10ps in our output file. - Call the
-o
flag and decide how you want to name the output file (system_reduced.xtc
)
- You may also want to include an index file you previously created (
index.ndx
) via the-n
flag. In this way, you can cut the trajectory and, at the same time, select only a specific part of the system you are interested in. GROMACS will still provide you with a default index file.
Convert trr into xtc
Trajectory files in GROMACS comes in two different formats:
- A “lighter” format named xtc that stores the trajectory with the coordinates of our system in low precision
- A more memory consuming format named trr with the higher precision trajectory of positions, velocities, and forces during the simulation
You can specify the format you prefer through the mdp file of your simulation.
However, you can also use the gmx trjconv
module to convert a trajectory file from one format to another.
For instance, if we are interested in converting a trr file into a xtc we can simply use this command:
|
|
- Call the program with
gmx
- Select the
trjconv
command - Select the
-f
flag and provide the starting trajectory in the trr format (system.trr
) - Choose the
-s
flag and enter the .tpr file - Call the
-o
flag and decide how you want to name the resulting trajectory file in the xtc format (system.xtc
)
- You can also use the
-dt
flag to reduce the number of frames in the output as already explained. - Also in this case, you may be interested in including an index file you previously created (
index.ndx
) via the-n
flag.
Select a part of a xtc trajectory file
We can use the gmx trjconv
command to cut the trajectory and obtain a new one in between two selected frames.
Here is the command:
|
|
- Call the program with
gmx
- Select the
trjconv
command - Select the
-f
flag and provide the starting trajectory in the preferred format (system.xtc
) - Choose the
-s
flag and enter the .tpr file (system.tpr
) - The
-b
flag signals the starting frame for the new trajectory (0 ps) - The
-e
flag tells the program the final frame (100 ps) - Call the
-o
flag and decide the name of the output file in the gro format (system.gro
)
How to extract a frame from a trajectory
You can also use the gmx trjconv
command to extract a frame from your trajectory and create the corresponding gro file with the structure of the system in that specific timestep.
The main idea is the one we just saw in the previous example. We just need to add the -dump
flag and specify a time value. By doing so, we will have the single frame that is the closest to the time we selected, and we can save as a gro file.
Here is an example of how to extract the closest frame to time $t=10ps$.
|
|
- Call the program with
gmx
- Select the
trjconv
command - Select the
-f
flag and provide the starting trajectory in the preferred format (system.xtc
) - Choose the
-s
flag and enter the .tpr file - The
-dump
flag is followed by the time value (10 ps) - Call the
-o
flag and decide the name of the output file in the gro format (system.gro
)
- Also in this case, you may be interested in including an index file you previously created (
index.ndx
) via the-n
flag. In this way, you can extract the structure of just a specific part of the system.
This approach might be a little slow when you want to extract a frame from a long trajectory, as GROMACS will need to scan through all the frames before getting to the one you need. Here I give you a little bonus trick.
If you want to extract the frame corresponding to the 100th ns you can play with the -b
and -e
flags that I showed you in the previous example:
|
|
GROMACS will start scanning the trajectory from the time you specified with the -b
flag and will immediately stop since the time is the same as the one specified in the -e
option. This will give you the frame you need in a matter of seconds.
How to center a molecule/protein in the simulation box
During an MD simulation you can encounter one of the following “problems” when you load the resulting system in your favourite molecular visualization software (e.g., PyMOL):
- The molecule/protein is not centered in the simulation box
- The molecule/protein you are simulating is broken into different pieces
- You see strangely elongated bonds when visualizing the structure of your resulting simulation
You shouldn’t worry about this. It is completely normal and is just a result of the implementation of Periodic Boundary Conditions (PBC). Reading the article where we discussed PBC should clarify the situation.
If you want to switch everything back to normal to have a proper visualization of your system you can use the gmx trjconv
command.
Through this module, you can center a specific part of your systems, such as a molecule or a protein, in your simulation box.
The command is as follows:
|
|
- Call the program with
gmx
- Select the
trjconv
command - Select the
-f
flag and provide the starting trajectory in the gro format (system.gro
) - Choose the
-s
flag and enter the .tpr file - The
-pbc mol -center
centers the molecule and puts back all the atoms in your system - Call the
-ur compact
to put all atoms at the closest distance from the center of the box. - As always, the
-o
flag is used to name the output file (centered.gro
)
- You will need a special index file if you want to center a “non-standard” part of your system. You can specify it via the
-n
flag.
Finally, you will be asked to:
1. Select the group that you want to center in the box.
2. Select the group that you want in the output file
To center a protein in the simulation box, and output a gro file containing the entire system you have to select:
- “Protein” as group 1
- “System” as group 2
How to convert gro file into pdb with GROMACS
Sometimes you may want to convert a gro file into a pdb file. GROMACS allows you to do that using the gmx trjconv
module in this way:
|
|
- Call the program with
gmx
- Select the
trjconv
command - Choose the
-s
flag and enter the tpr file - Select the
-f
flag and provide the gro file you want to convert (system.gro
) - Call the
-o
flag and decide the name of the output file in the pdb format (system.pdb
) - The
-pbc
flag specifies the Periodic Boundary Conditions (PBC) treatment. Through thewhole
option we make all broken molecules whole. - The
-conect
flag is needed to write the CONECT records in the output pbd file
- Also in this case, you may be interested in including an index file you previously created (
index.ndx
) via the-n
flag. In this way, you can extract the structure of just a specific part of the system.
GROMACS will ask you to provide the group you want in your pdb file. You can select the overall system or any part of the system you desire. For instance, if you simulated a protein in water but you only want the pdb file of the protein without the solvent you can select Group 1 "Protein"
.
The trjcat command: How to merge two trajectory files
The second command we are going to see is gmx trjcat
. Through this command, we can join two or more different trajectory files.
This module has far less applications than the previous one but it may still be useful in a few cases.
The command is this one:
|
|
- Call the program with
gmx
- Select the
trjcat
command - Select the
-f
flag and provide two trajectories in the preferred format (traj_1.xtc
,traj_2.xtc
). - Call the
-o
flag and decide the name of the output file in the gro format (final_traj.xtc
)
- The
-settime
option allows you to interactively select the starting time for all the trajectories you want to concatenate. - By default, GROMACS will overwrite frames having the same timestamps. If you want to retain all of them you can use the
-cat
option.
If you have multiple xtc files in your directory you can simply use *xtc
instead of explicitly pass all the trajectories. GROMACS will automatically order all the files depending on the time value and then proceed to join them.