Authors: Brian Weitzner (brian.weitzner@jhu.edu), Monica Berrondo (mberron1@jhu.edu), Krishna Kilambi (kkpraneeth@jhu.edu), Robin Thottungal (raugust1@jhu.edu), Sidhartha Chaudhury (sidc@jhu.edu), Chu Wang (chuwang@gmail.com), Jeffrey Gray (jgray@jhu.edu)
Last edited 7/17/2017 by Nick Marze. Corresponding PI Jeffrey Gray (jgray@jhu.edu).
An introductory tutorial of protein-protein docking can be found here.
rosetta/main/source/src/apps/public/docking/docking_protocol.cc
rosetta/main/source/src/protocols/docking/DockingProtocol.cc
rosetta/main/tests/integration/tests/docking*
(docking_full_protocol, docking_local_refine, docking_local_refine_min, docking_low_res, docking_distance_constraint, docking_site_constraint).To run docking, type the following in a commandline:
[path to executable]/docking_protocol.[platform|linux/mac][compile|gcc/ixx]release –database [path to database] @options
Note: these demos will only generate one decoy. To generate a large number of decoys you will need to add –nstruct N (where N is the number of decoys to build) to the list of flags.
We recommend the following articles for further studies of RosettaDock methodology and applications:
Determine the structure of protein-protein complexes by using rigid body perturbations of the protein chains.
The following description has been adapted from Chaudhury, et al., 2011, PLoS One:
RosettaDock is a Monte Carlo (MC) based multi-scale docking algorithm that incorporates both a low-resolution, centroid-mode, coarse-grain stage and a high-resolution, all-atom refinement stage that optimizes both rigid-body orientation and side-chain conformation. The algorithm roughly follows the biophysical theory of an encounter complex formation followed by a transition to a bound state. Typically the algorithm starts from either a random initial orientation of the two partners (global docking), or an initial orientation that is randomly perturbed from a user-defined starting pose (local perturbation). From there, the partner proteins are represented coarsely, where side chains are replaced by a single unified pseudo-atom, or centroid. During this phase, a 500-step Monte Carlo search is made with adaptive rotation and translational steps adjusted dynamically to achieve an acceptance rate of 25%. The ScoreFunction used in this stage primarily consists of a ‘bump’ term, a contact term, and docking-specific statistical residue environment and residue-residue pair-wise potentials.
Once the centroid-mode stage is complete, the lowest energy structure accessed during that stage is selected for high-resolution refinement. During high-resolution refinement, centroid pseudo-atoms are replaced with the side-chain atoms at their initial unbound conformations. Then 50 Monte Carlo+Minimization (MCM) steps are made in which:
Every eight steps, an additional combinatorial side-chain optimization is carried out using the full side-chain packing algorithm, followed by an additional Metropolis criteria check. To reduce the time devoted to the computationally expensive energy-minimization for unproductive rigid-body moves, minimization is skipped if a rigid-body move results in a change in score of greater than +15. The all-atom score function used in this stage primarily consists of Van der Waals attractive and repulsive terms, a solvation term, an explicit hydrogen bonding term, a statistical residue-residue pair-wise interaction term, an internal side-chain conformational energy term, and an electrostatic term.
For particular targets, a variety of RosettaDock sampling strategies are often used to improve the chance of achieving an accurate structure prediction. If no prior structural or biochemical information is known about the protein interaction of interest, global docking is used to randomize the initial docking poses. From there, score filters and clustering are used to identify clusters of acceptable low-energy structures for further docking and refinement. In most cases, there is some known information about the complex, either in the form of related protein complexes or in biochemical or bioinformatics data which identify probable regions of interaction on the protein partners. In these cases users manually arrange the starting docking pose to a configuration that is compatible with the information and carry out a local docking perturbation. Additionally, users can set distance-based filters that bias sampling towards those docking poses that are compatible with specified constraints.
The only required input file is a prepacked pdb file containing two proteins with different chain IDs. Starting structures must be prepacked because the side chains are only packed at the interface during docking. Running docking prepack protocol ensures that the side chains outside of the docking interface have a low energy conformation which provides a better reference state for scoring. For more information on prepacking, see the Docking Prepack protocol documentation.
Note: The following flags should be given to every docking simulation: -ex1 -ex2aro.
If you are using a starting structure with more than two polypeptide chains, you should include the -partners flag. If this flag is omitted, docking will dock the first two polypeptide chains in the strucutre.
Note: An ensemble of input structures can be given using the follow flags: -ensemble1 [partner_1_pdb_list]
or/and -ensemble2 [partner_2_pdb_list]
. This ensemble file should be generated using docking_prepack_protocol. It contains a list of prepacked pdbs followed by the probabilities of swapping them.
Flag | Description | Type |
---|---|---|
-partners [P1_P2] | Defines docking partners by chain ID for multichain docking. For example, "-partners LH_A" moves chain A around the dimer of chains L and H. | String |
Flag | Description | Type |
---|---|---|
-randomize1 | Randomize the orientation of the first docking partner. (Only works with 2 partner docking). (Global). | Boolean |
-randomize2 | Randomize the orientation of the second docking partner. (Only works with 2 partner docking). (Global). | Boolean |
-spin | Spin a second docking partner around axes from center of mass of the first partner to the second partner. (Global). | Boolean |
-dock_pert [T] [R] | To create a starting strucutre from the input structure, randomly perturb the input structure using a gaussian for translation and rotation with standard deviations [T] and [R]. Recommended usage is "-dock_pert 3 8". (Global and Local). | RealVector |
-uniform_trans [R] | Uniform random repositioning of the second partner about the first partner within a sphere of the given radius, [R]. | Real |
-use_ellipsoidal_randomization true | Randomizes docking partners about ellipsoids rather than spheres. Recommended in all cases, but especially when docking partners are elongated or flattened. To be used in concert with -randomize1 and/or -randomize2. (Global) | Boolean |
Flag | Description | Type |
---|---|---|
-norepack1 | Do not repack the sidechains on the first docking partner. (Only works with 2 partner docking). | Boolean |
-norepack2 | Do not repack the sidechains on the second docking partner. | Boolean |
-sc_min | Perform extra side chain minimization steps during packing steps. | Boolean |
Default mode of docking. No additional flags necessary.
Flag | Description | Type |
---|---|---|
-low_res_protocol_only | Only run the low resolution part of the protocol (skips all high resolution steps and only outputs low resolution structure). | Boolean |
Flag | Description | Type |
---|---|---|
-docking_local_refine | Refine the docking position in high resolution only (skips all low resolution steps of the protocol). Uses small perturbations of the positions, no large moves. | Boolean |
Flag | Description | Type |
---|---|---|
-docking_local_refine | Refine the docking position in high resolution only (skips all low resolution steps of the protocol). Uses small perturbations of the positions, no large moves. | Boolean |
-dock_min | Does a single round of minimization in high resolution, skipping the mcm protocol. | Boolean |
Flag | Description | Type |
---|---|---|
-s [S] OR -silent [S] | Specify the file name of the starting structure, S. | String |
-native [S] | Specify the file name of the native structure, [S], for which to compare in RMSD calculations. If a native file is not passed in, all calculations are done using the starting structure as native. | String |
-nstruct [I] | Specify the number of decoys, [I], to generate. | Integer |
-database [P] | The path to the Rosetta database (e.g. /path/to/rosetta/main/database). | String |
-use_input_sc | Use accepted rotamers from the input structure between Monte Carlo+Minimization (MCM) cycles. Unlike the -unboundrot flag from Rosetta++, not all rotamers from the input structure are added each time to the rotamer library, but only those accepted at the end of each round the remaining conformations are lost. | Boolean |
-ex1/-ex1aro -ex2/-ex2aro -ex3 -ex4 | Adding extra side-chain rotamers (highly recommended). The -ex1 and -ex2aro flags were used in our own tests, and therefore are recommended as default values. | Boolean/Integer |
-constraints:cst_file [S] | Specify the name of the constraint file, [S]. | String |
Flag | Description | Type |
---|---|---|
-dock_mcm_trans_magnitude [T] | The magnitude of the translational perturbation during MCM steps in docking in Angstroms. Defaults to 0.7 Å | Real |
-dock_mcm_rot_magnitude [R] | The magnitude of the rotational perturbation during MCM steps in docking in degress. Defaults to 5.0º | Real |
-docking_centroid_outer_cycles [C] | Number of cycles to use in outer loop of docking low resolution protocol. Defaults to 10. | Integer |
-docking_centroid_inner_cycles [C] | Number of cycles to use in inner loop of docking low resolution protocol. Defaults to 50. | Integer |
The docking run can take two forms. Sometimes biochemical and genetic information can be used to localize the binding site to a small region on one or both partners. In this case, one performs a perturbation run, exploring only a small region of space around the suspected binding site. For predictions where there is no biological information about the interface, one usually performs a global search, exploring all the conformational space of both partners.
Global docking gives best results when there are less than 450 residues in the complex. (see Daily, 2005 )
The most commonly used options are -dock_pert 3 8, which will create starting structures using 3 Angstrom translations and rotations within 8 degrees.
Note that one should always run the Docking Prepack application on the starting structure in order to remove high-energy rotamers that could cause erroneous decoy ranking.
There are two scores that one should inspect:
Always use the -ex1 and -ex2aro options to allow more fine grained rotamer selection.
For perturbation runs, generate at least 1,000 decoys. For global runs, generate between 10,000 and 100,000 (we know this requires a lot of cpu time and disk space).
If the interface score doesn't show up in the scorefile and you want to add it, you can use the flag -score:docking_interface_score 1
You can use the docking protocol to calculate interface RMSDs:
/path/to/docking_protocol.linuxgccrelease -l input_pdbs.list \
-native native.pdb \
-docking_local_refine \
-dock_min \
-out:file:score_only recalc_rmsd.sc \
-nstruct 1
Docking now supports AtomPairConstraint, AmbiguousConstraint and SiteConstraint. To use a constraint with docking, you only need to add the option -constraints:cst_file constraint_file . See the docking_distance_constraint and docking_site_constraint integration tests for examples. A SiteConstraint allows you to specify that a particular residue should be in contact with a particular chain. An example of a SiteConstraint is:
SiteConstraint CA 4A D FLAT_HARMONIC 0 1 5
This will add a FLAT_HARMONIC potential with the parameters 0 1 5 (recommended; see this page for more on constraint files) around the distance between the CA of residue 4 (PDB numbering) on chain A and the closest CA on chain D to the ScoreFunction. The FLAT_HARMONIC function with these parameters is centered on 0, with a spring constant of 1 and a window of 5. That is, there's no penalty for 5 around zero, and then starting at 5 the penalty goes up as d^2 for every d units past 5 you go.
If you are looking to have SiteContraints for any Atom of a particular residue to any atom of the chain specified, checkout the AmbiguousContraints
If you want to dissallow certain residues, look into the SIGMOID function. As Rocco Moretti explains from a forum post:I might recommend trying a SIGMOID function( (1/(1+exp(-slope*( x-x0 ))) - 0.5). This is set up to give a favorable value near zero (and for negative numbers) when the slope is positive, but you could try a negative slope to give a favorable value further away from zero. e.g.
SiteConstraint CA 16B A SIGMOID 5.0 -2.0
Would give a sigmoid constraint centered around 5 with values being disfavorable near zero and favorable greater than 5 (slope -4). You will likely need to play around with the slope and cutoff to get things the way you want. Also keep in mind that a sigmoid never quite reaches zero, so the constraint will always want to push things apart, but with a steep enough slope the effect will be negligible.
One PDB file for each candidate docked model generated and a 1 scorefile for each run summarizing all generated models.
Sort scorefile by score using commandline sort function. For global docking simualations, one should generate at least 10,000 decoys and ideally 100,000 decoys should be produced. Sort by score (pay attention to I_sc and total!) and cluster the top 200 decoys by pairwise RMSD. Since global docking in Rosetta 3 has not been thoroughly tested, we do not have scripts available to automate this process. We recommend using the scripts as mentioned in our Rosetta++ tutorial . Some scripts may require some modification.