GROMACS Tutorial: Coarse-Grained Simulations with Martini 3 Force Field
In today’s blog post, we’re going to explore an interesting technique called coarse-grained (CG) simulation and we are going to see how to run them using GROMACS.
Specifically, we’ll focus on the Martini force field, which is rapidly gaining popularity to perform CG simulations of biomolecular systems.
In this article, we won’t dive into the intricate details of the coarse-graining technique itself, as that warrants a discussion of its own. Instead, the primary focus will be on guiding you through the process of running a CG simulation using GROMACS.
Now, before we dive in, it’s important to note that this tutorial assumes a basic familiarity with GROMACS. If you’re new to GROMACS, I encourage you to explore some of the previous blog posts to familiarize yourself with the software’s functionalities.
. Once you have a grasp of the basics, you’ll be ready to start.
Coarse-Grained Simulations
Standard MD simulations, while incredibly valuable, can be computationally intensive and time-consuming, often limiting our ability to study long-timescale processes in biology.
To overcome these limitations, coarse-grained (CG) simulations have emerged as a powerful solution.
The basic principle behind CG simulations is the mapping of multiple atoms to a single representative entity, often referred to as a bead. By doing so, we have different advantages over atomistic simulations. Here I will only mention the two that are more intuitive:
-
Reduced Atom Count: In CG simulations, multiple atoms are mapped to a single representative entity, resulting in a significantly reduced number of atoms in the system. This reduction in atom count allows for more efficient computations and faster simulations.
-
Larger Timesteps: The simplification of the system in CG simulations also enables the use of larger timesteps compared to atomistic simulations. With fewer degrees of freedom to consider, CG simulations can effectively explore longer timescales, potentially up to 20 fs or even more. This larger timestep further contributes to the computational speedup of CG simulations.
The reduction of degrees of freedom significantly accelerates the simulation while still capturing the essential dynamics of the system. This efficiency enables the investigation of biologically relevant phenomena occurring over longer timescales that would otherwise be challenging to simulate using atomistic models.
CG simulations workflow
The general workflow is not particularly different from a standard MD simulation. The only thing we need to change is the initial step since the protein will need to be converted from an atomistic structure to a coarse-grained one but the rest will be pretty much similar to a standard MD simulation except for the fact that we will have to twist some of the parameters in the mdp file.
Convert the structure to CG with Martinize
First of all, we can download an atomistic structure from the Protein Data Bank. As an example, we are going to use the famous lysozyme (PDB ID: 1AKI)
|
|
The first thing to do as always is to make sure to clean the protein from other molecules. You can use the grep
command
|
|
To convert the clean atomistic structure of the protein to CG and generate the corresponding topology file we can use the Martinize python script.
You can download Martinize by following the procedure reported here. Or you can simply run these two:
|
|
|
|
To check whether the installation was successful you can try:
|
|
To convert an atomistic structure into a coarse-grained (CG) representation using the martinize2
tool, we need to provide various flags and parameters. Let’s break down the command and explain each flag in detail:
|
|
- The command begins with
martinize2
, which is the executable for themartinize2
tool. - The
-f
flag is used to specify the input atomistic structure file in PDB format. In this case, it is1aki_clean.pdb
. - Provide the secondary structure via the
-dssp
flag (More info below). - The
-x
flag is used to specify the output file name for the converted CG structure. Here, the name1aki_cg.pdb
is chosen for the coarse-grained structure file. - Select the
-o
flag and name the topology (topol.top
). The topology file will contain all the information about the system - Select the Martini 3 force field with
-ff martini3001
- The
-scfix
flag applies side-chain corrections. - The
-cys
flag is set toauto
, which allows the program to automatically detect disulfide bonds in the protein and create appropriate constraints. - The
-p backbone
option defines position restraints on the backbone. - The
-elastic
flag activates the elastic network model. Elastic networks are used to introduce harmonic constraints between pairs of atoms within a certain cutoff distance, simulating the connectivity and flexibility of the protein. - The
-ef
,-el
, and-eu
flags control the parameters for the elastic network.-ef 700.0
sets the force constant for the elastic network springs to 700 • kJ/mol/nm$^2$,-el 0.5
defines the lower cutoff for interactions, and-eu 0.9
specifies the upper cutoff. These parameters determine the strength and range of the harmonic constraints in the elastic network.
Note that this also requires as input the secondary structure of the protein. A convenient way to pass it to Martinize is to download the dssp tool and give the executable via the -dssp
flag.
More info on the tool is here. You can download it via conda.
|
|
You will need to give the full path to dssp to the martinize2
. To get it you can type one of the following two commands:
|
|
It should return a path. In my case was something like this
|
|
This will be the path to use for the script
This will generate three files:
- the CG structure (
1aki_cg.pdb
) - The corresponding topology (
topol.top
) - The parameters file for the protein (
molecule_0.itp
). Here you will also find the command you used to generate the parameters.
By comparing the initial all-atom structure (1aki_clean.pdb
) and the obtained CG structure (1aki_cg.pdb
), you will easily notice a difference.
In the all-atom structure, each atom is individually represented, retaining the level of atomistic detail. In contrast, the coarse-grained structure simplifies the representation by grouping atoms into beads, reducing the complexity of the system and enabling faster simulations.
The image showcases the comparison between the all-atom structure and the coarse-grained structure of a serine residue in the protein. In this example, the serine residue is mapped onto two beads,
System preparation with insane
After obtaining our coarse-grained (CG) structure, we need to prepare the system for the simulation. This process typically involves three essential steps: solvation, ion addition for system neutrality, and optionally embedding the protein in a membrane environment.
To streamline this process, one convenient tool we can utilize is the insane
tool to build CG systems. It offers a range of functionalities, including generating a solvated system and embedding the protein in a membrane if desired. You can obtain the insane.py
script from here.
Once you have acquired the script, you can utilize it to solvate the system with water and add the necessary ions for neutralization. Let’s examine the following Python command:
|
|
Let’s break down the options and parameters of this command:
-
-f 1aki_cg.pdb
: This option specifies the input CG structure file in PDB format, which in this case is1aki_cg.pdb
. -
-p topol.top
: The-p
flag is used to provide the topology file (topol.top
). -
-o system.gro
: The-o
flag sets the name of the output file for the solvated system in the gro format (system.gro
) -
-d 7
: builds the periodic box such that the distance between two periodic images is greater than 7 nm. -
-sol W
: The-sol
flag specifies the solvent type to add to the system. Here,W
represents water molecules. By specifying-sol W
, we indicate that we want to solvate the system with water. -
-salt 0.15
: The-salt
flag is used to add salt ions to the system. The value0.15
indicates the desired salt concentration in moles per liter (M).
By executing this Python command, the insane.py
script will generate a solvated system, with the specified number of water molecules and added ions to neutralize the system. The resulting structure will be saved as monomer_6cm4_memb_cg.gro
and can be further used for subsequent steps in the simulation.
Modify the topology and itp files
The itp
files for the Martini 3 force field can be downloaded here. I generally store them in a directory named martini_itp
. Inside here you will find the general parameters of the force field, as well as the ones for the water and the ions.
|
|
Now you can also move the previously generated itp
file for the protein inside this folder.
|
|
Lastly, we need to modify the topology file (topol.top
) to include all the parameters we need for our simulation. This is what the file should be like:
|
|
Now we are ready to go.
Simulation
Once we have prepared the solvated system, we can proceed with the simulation, which follows a similar protocol to a standard molecular dynamics (MD) simulation. As always, the key steps include minimization, equilibration, and the production run.
The only differences you will find are the mdp
files for the runs, and the difference in performance with respect to an atomic simulation.
Minimization
Let’s start by creating a directory dedicated to the minimization phase and navigating into it:
|
|
Next, you will need to download the mdp
file for the minimization run and copy that into this directory.
This file contains the specifications for the minimization run. For example, let’s assume the file is named minim.mdp
. To generate the binary input file (tpr
) needed for the minimization run, execute the following command:
|
|
With the em.tpr
file generated, we can proceed to run the minimization simulation using the following command:
|
|
Equilibration
After the minimization step, we can proceed with the equilibration phase of our coarse-grained (CG) simulation.
Due to the increased stability of CG models, a more rough equilibration is typically sufficient. In this case, we will focus on a 50 ns NPT (constant number of particles, pressure, and temperature) simulation.
Now we can create a directory for the NPT equilibration and navigate into it.
|
|
Find the mdp file for the NPT run here. Next, we can generate the binary input file (tpr
) for the NPT equilibration run using the following command:
|
|
With the npt.tpr
file generated, we can now run the NPT equilibration simulation. To execute the simulation in the background, use the following command:
|
|
Production run
Finally, we will run a 1 μs production run following the usual procedure. Create a directory and move into it.
|
|
Download the mdp file for the production run here, then you can use it to generate the md.tpr
file.
|
|
Run the simulation in background:
|
|
Assessing CG Simulation Quality
Coarse-grained (CG) simulations offer the advantage of significantly reducing the computational time required for molecular dynamics simulations.
However, it is crucial to exercise caution and ensure the reliability of the results obtained from CG simulations. Comparing the fluctuations of the CG system with those of the same system at the all-atom level can help validate the simulation and assess its quality.
To perform this comparison, you can utilize the tools available in GROMACS.
The RMSD analysis allows you to quantify the deviation between the CG and all-atom structures by calculating the average distance between corresponding atoms in the two systems. If the RMSD value is within an acceptable range, it indicates that the CG model accurately captures the behavior of the all-atom system.
The RMSF analysis measures the fluctuation of individual atoms in the CG system. By comparing the RMSF profiles of the CG and all-atom systems, you can identify any significant discrepancies in the dynamics and fluctuations of the two representations. Large differences in the RMSF values may indicate potential inaccuracies or limitations in the CG model.
If the fluctuations are too high you may have to increase the value of the elastic network provided to the martinize
tool with -ef
, and vice versa if the fluctuations are too low you may want to decrease the strength of the elastic network.
We generally compute the RMSD backbone to backbone but, since we mapped the original atomistic structure to CG GROMACS will not be able to recognize the backbone by default. For this reason, you will have to create a specific index file (index.ndx
):
|
|
and select only the backbone bead (BB
).
|
|
Now you can use this group of atoms to compute the RMSD and RMSF.