Back to Mover page.
Documentation added 25 Mar 2021 by Vikram K. Mulligan, Flatiron Institute (vmulligan@flatironinstitute.org).
The trRosettaConstraintGenerator takes as input a multiple sequence alignment, converts this to a one-hot 3D input tensor, and runs this through the trRosetta neural network to generate predictions of the probability distributions of inter-residue distances and orientations. It then converts this information to a list of Rosetta constraints.
These constraints can be useful for de novo structure prediction. In this case, however, it is not necessary to script this constraint generator directly. Instead, the trRosetta application and the trRosettaProtocol mover wrap this constraint generator, and provide a full structure prediction pipeline that takes an MSA as input and produces a pose as output. If one wishes to do more exotic things, such as using trRosetta distance and orientation constraints in design, docking, or loop modelling, it can be useful to have direct access to the trRosettaConstraintGenerator, however.
Although "omega" and "phi" are commonly used to refer to the third and first mainchain backbone dihedrals of an alpha amino acid, and "theta" is used to refer to the second mainchain backbone dihedral of a beta-amino acid, in the context of trRosetta-related protocols, these Greek letters are assigned new meanings. Here, "omega" refers to the inter-residue dihedral angle between the CA and CB atoms of a first residue and the CB and CA atoms of a second residue. "Theta" refers to the inter-residue dihedral angle between the N, CA, and CB atoms of a first residue and the CB atom of a second residue. And "phi" refers to the inter-residue angle between the CA and CB atoms of a first residue and the CB atom of a second residue.
The trRosettaConstraintGenerator requires that Rosetta be compiled with Tensorflow support. See the autogenerated description below for details on how to compile Rosetta and link Tensorflow.
Autogenerated Tag Syntax Documentation:
The trRosettaConstraintGenerator takes as input a file containing a multiple sequence alignment, feeds this to the trRosetta neural network, and uses the output to generate distance and angle constraints between pairs of residues as described in Yang et al. (2020) Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 117(3):1496-503. https://doi.org/10.1073/pnas.1914677117.
The trRosettaConstraintGenerator requires compilation with Tensorflow support. To compile with Tensorflow support:
Download the Tensorflow 1.15 precompiled libraries for your operating system from one of the following. (Note that GPU versions require CUDA drivers; see https://www.tensorflow.org/install/lang_c for more information.) Linux/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-linux-x86_64-1.15.0.tar.gz Linux/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz Windows/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-windows-x86_64-1.15.0.zip Windows/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-windows-x86_64-1.15.0.zip MacOS/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-darwin-x86_64-1.15.0.tar.gz MacOS/GPU: None available.
Unzip/untar the archive into a suitable directory (~/mydir/ is used here as an example), and add the following environment variables: Linux, Windows: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib MacOS: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib
Edit your user.settings file (Rosetta/main/source/tools/build/user.settings), and uncomment (i.e. remove the octothorp from the start of) the following lines: import os 'program_path' : os.environ['PATH'].split(':'), 'ENV' : os.environ,
Compile Rosetta, appending extras=tensorflow (for CPU-only) or extras=tensorflow_gpu (for GPU) to your scons command. For example: ./scons.py -j 8 mode=release extras=tensorflow bin
References and author information for the trRosettaConstraintGenerator constraint generator:
trRosetta Neural Network's citation(s): Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, and Baker D. (2020). Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci USA 117(3):1496-503. doi: 10.1073/pnas.1914677117.
trRosettaConstraintGenerator ConstraintGenerator's author(s): Vikram K. Mulligan, Systems Biology, Center for Computational Biology, Flatiron Institute vmulligan@flatironinstitute.org
<trRosettaConstraintGenerator name="(&string;)" msa_file="(&string;)"
generate_distance_constraints="(true &bool;)"
generate_omega_constraints="(true &bool;)"
generate_theta_constraints="(true &bool;)"
generate_phi_constraints="(true &bool;)"
distance_constraint_prob_cutoff="(0.05 ℜ)"
omega_constraint_prob_cutoff="(0.55 ℜ)"
theta_constraint_prob_cutoff="(0.55 ℜ)"
phi_constraint_prob_cutoff="(0.65 ℜ)"
distance_constraint_weight="(1.0 ℜ)"
omega_constraint_weight="(1.0 ℜ)"
theta_constraint_weight="(1.0 ℜ)"
phi_constraint_weight="(1.0 ℜ)" />
At the time of this writing, it is recommended to leave all options set to defaults unless one has reason to customize the settings.
The following example roughly reproduces the protocol used by the trRosettaProtocol mover and trRosetta application. Note that (a) this is somewhat simplified, limiting its accuracy as a structure prediction protocol, and (b) it is not necessary to manually script the full structure prediction protocol, since the mover or the application can be used instead. This is only for demonstration purposes to show how the trRosettaConstraintGenerator can be scripted.
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="cen" weights="score0.wts" >
<Reweight scoretype="atom_pair_constraint" weight="5.0" />
<Reweight scoretype="angle_constraint" weight="1.0" />
<Reweight scoretype="dihedral_constraint" weight="1.0" />
</ScoreFunction>
<ScoreFunction name="r15" weights="ref2015.wts" />
<ScoreFunction name="r15_cst" weights="ref2015_cst.wts" />
</SCOREFXNS>
<SIMPLE_METRICS>
<RMSDMetric name="measure_rmsd"
use_native="true"
super="true"
custom_type="RMSD_after_centroid_phase_"
rmsd_type="rmsd_protein_bb_heavy"
/>
<RMSDMetric name="measure_rmsd2"
use_native="true"
super="true"
custom_type="RMSD_after_fullatom_phase_"
rmsd_type="rmsd_protein_bb_heavy"
/>
</SIMPLE_METRICS>
<CONSTRAINT_GENERATORS>
<trRosettaConstraintGenerator name="gen_csts"
msa_file="inputs/1r6j_msa.a3m"
/>
</CONSTRAINT_GENERATORS>
<MOVERS>
<InitializeByBins name="randomize_bb"
bin_params_file="ABBA"
/>
<AddConstraints name="gen_csts_mover"
constraint_generators="gen_csts"
/>
<MinMover name="minimize"
scorefxn="cen"
tolerance="0.0000001"
bb="true" chi="false" jump="0"
/>
<ClearConstraintsMover name="remove_csts" />
<SwitchResidueTypeSetMover name="make_fullatom" set="fa_standard"/>
<FastRelax name="frlx" repeats="3" scorefxn="r15_cst" />
</MOVERS>
<PROTOCOLS>
<Add mover="randomize_bb" />
<Add mover="gen_csts_mover" />
<Add mover="minimize" />
<Add metrics="measure_rmsd" />
<Add mover="remove_csts" />
<Add mover="make_fullatom" />
<Add mover="gen_csts_mover" />
<Add mover="frlx" />
<Add metrics="measure_rmsd2" />
</PROTOCOLS>
<OUTPUT scorefxn="r15" />
The input multiple sequence alignment file can be generated with HHBlits or other software. See the trRosettaProtocol mover documentation for more details and an example of the .a3m file format.
Please see the trRosetta application documentation for information about the trRosetta code organization.