Skip to content

02 - Protein preparation from RCSB

Protein Data Bank (PDB) files are not immediately usable for MD simulations. Thus, we have to perform several steps to clean and prepare our PDB files.

Logging

First, we specify some environmental variables for metalflare. This is mostly to enable logging.

export METALFLARE_LOG=True
export METALFLARE_STDOUT=True
export METALFLARE_LOG_LEVEL=10

Download PDB

First, we download the protein from RCSB. For an example, we will be using 1JC0.

wget https://files.rcsb.org/download/$PDB_ID.pdb -O $SAVE_DIR/0-$PDB_ID.pdb

We receive the following structure.

Select protein

We will only be working with one protein and need to select chain A. We use the metalflare-select-atoms script which just drives pdb.select.run_select_atoms().

metalflare-select-atoms $SAVE_DIR/0-$PDB_ID.pdb $SAVE_DIR/1-$PDB_ID-chain-A.pdb --select_str "chainID A"

We keep all crystallographic waters so the CRO-coordinating water's position is maintained.

Minify PDB lines

Keep only ATOM and HETATM lines.

metalflare-filter-pdb $SAVE_DIR/1-$PDB_ID-chain-A.pdb  --output $SAVE_DIR/2-$PDB_ID-filtered.pdb

Centering

ATOM    471  CG  LEU A  64     172.089   9.780  36.151  1.00 35.35      A    C
ATOM    472  CD1 LEU A  64     172.939   9.531  34.924  1.00 28.06      A    C
ATOM    473  CD2 LEU A  64     170.620   9.387  35.892  1.00 33.01      A    C
HETATM  474  N1  CRO A  65     173.570   8.493  40.293  1.00 24.05      A    N
HETATM  475  CA1 CRO A  65     174.025   7.483  41.259  1.00 25.10      A    C
HETATM  476  CB1 CRO A  65     175.255   6.742  40.591  1.00 31.02      A    C
metalflare-center $SAVE_DIR/2-$PDB_ID-filtered.pdb --output $SAVE_DIR/3-$PDB_ID-centered.pdb
ATOM    471  CG  LEU A  64      -1.838  -0.100  -6.847  1.00 35.35      A    C
ATOM    472  CD1 LEU A  64      -0.988  -0.349  -8.074  1.00 28.06      A    C
ATOM    473  CD2 LEU A  64      -3.307  -0.493  -7.106  1.00 33.01      A    C
HETATM  474  N1  CRO A  65      -0.357  -1.387  -2.705  1.00 24.05      A    N
HETATM  475  CA1 CRO A  65       0.098  -2.397  -1.739  1.00 25.10      A    C
HETATM  476  CB1 CRO A  65       1.328  -3.138  -2.407  1.00 31.02      A    C

Rotate structure

Will attempt to rotate the structure in order to minimize the number of water molecules we will add later.

metalflare-minimize-box $SAVE_DIR/3-$PDB_ID-centered.pdb --output $SAVE_DIR/4-$PDB_ID-rotated.pdb

For our 1JC0 example, the box volume decreased from 88 063 to 71 986 Å3 after this script, which decreases the number of water molecules by at least 500.

Unify residue IDs

ATOM    471  CG  LEU A  64     172.089   9.780  36.151  1.00 35.35      A    C
ATOM    472  CD1 LEU A  64     172.939   9.531  34.924  1.00 28.06      A    C
ATOM    473  CD2 LEU A  64     170.620   9.387  35.892  1.00 33.01      A    C
HETATM  474  N1  CRO A  66     173.570   8.493  40.293  1.00 24.05      A    N
HETATM  475  CA1 CRO A  66     174.025   7.483  41.259  1.00 25.10      A    C
HETATM  476  CB1 CRO A  66     175.255   6.742  40.591  1.00 31.02      A    C
metalflare-unify-resids $SAVE_DIR/4-$PDB_ID-rotated.pdb --output $SAVE_DIR/5-$PDB_ID-residues.pdb
ATOM    471  CG  LEU A  64     172.089   9.780  36.151  1.00 35.35      A    C
ATOM    472  CD1 LEU A  64     172.939   9.531  34.924  1.00 28.06      A    C
ATOM    473  CD2 LEU A  64     170.620   9.387  35.892  1.00 33.01      A    C
HETATM  474  N1  CRO A  65     173.570   8.493  40.293  1.00 24.05      A    N
HETATM  475  CA1 CRO A  65     174.025   7.483  41.259  1.00 25.10      A    C
HETATM  476  CB1 CRO A  65     175.255   6.742  40.591  1.00 31.02      A    C

Residue states

Methionine

Methionine (MET) residues are often artificially changed to selenomethionine (MSE) to ensure proper crystallization by multi-wavelength anomalous dispersion. We almost always want to model with the wild-type MET, so we replace any MSE with MET residues and Se atoms to S.

metalflare-rename-resname $SAVE_DIR/5-$PDB_ID-residues.pdb MSE MET --output $SAVE_DIR/5-$PDB_ID-residues.pdb

Cysteine

metalflare-rename-resname $SAVE_DIR/5-$PDB_ID-residues.pdb CYS CYM --include 145 202 --output $SAVE_DIR/5-$PDB_ID-residues.pdb

Protonation and steric clashes

PDB2PQR predicts protonation states of histidine (HIS), aspartic acid (ASP), glutamic acid (GLU), lysine (LYS).

pdb2pqr --log-level INFO --ff=AMBER --keep-chain --ffout=AMBER $SAVE_DIR/5-$PDB_ID-residues.pdb $SAVE_DIR/6-$PDB_ID-pdb2pqr.pdb

Sometimes PDB2PQR cannot process some atoms, so we need to add them back.

metalflare-merge-pdbs $SAVE_DIR/6-$PDB_ID-pdb2pqr.pdb $SAVE_DIR/5-$PDB_ID-residues.pdb --output $SAVE_DIR/6-$PDB_ID-pdb2pqr.pdb

Warning

PDB2PQR cannot process non-standard residues (e.g., the GFP chromophore) and thus cannot add hydrogens to them. These are often added later using a program like tleap.

Unify water residues

metalflare-rename-resname $SAVE_DIR/6-$PDB_ID-pdb2pqr.pdb HOH WAT --output $SAVE_DIR/7-$PDB_ID-resnames.pdb
metalflare-rename-resname $SAVE_DIR/7-$PDB_ID-resnames.pdb TIP WAT --output $SAVE_DIR/7-$PDB_ID-resnames.pdb
metalflare-rename-resname $SAVE_DIR/7-$PDB_ID-resnames.pdb TIP3 WAT --output $SAVE_DIR/7-$PDB_ID-resnames.pdb
metalflare-unify-waters $SAVE_DIR/7-$PDB_ID-resnames.pdb --output $SAVE_DIR/7-$PDB_ID-resnames.pdb

pdb4amber

pdb4amber -i $SAVE_DIR/7-$PDB_ID-resnames.pdb > $SAVE_DIR/8-$PDB_ID-pdb4amber.pdb 2> pdb4amber.err