Back to Mover page. Documentation added 25 March 2021 by Vikram K. Mulligan, Flatiron Institute (vmulligan@flatironinstitute.org).
The trRosettaProtocol mover provide the same functionality as the trRosetta application, but in the form of a mover that can be used in RosettaScripts or PyRosetta scripts, or in C++ code. Although most movers take a pose as input, manipulate it, and produce a pose as output, the trRosettaProtocol mover discards the input pose and builds a new one. The inputs are a sequence or FASTA file and a multiple sequence alignment; the latter is input into the trRosetta neural network to generate inter-residue distance and orientation constraints that guide structure prediction. Each run of the trRosettaProtocol mover will generate a new predicted structure. These structures tend to show a small amount of variation, so relatively low levels of sampling are necessary. On the other hand, this means that this protocol is not ideal for large-scale conformational sampling (e.g. to evaluate whether the energy landscape has alternative minima).
The trRosettaProtocol mover requires that Rosetta be compiled with Tensorflow support. See the autogenerated description below for details on how to compile Rosetta and link Tensorflow.
Autogenerated Tag Syntax Documentation:
Implements the full trRosetta protocol, as described in Yang et al. (2020) Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 117(3):1496-503. https://doi.org/10.1073/pnas.1914677117. This mover takes as input a multiple sequence alignment, runs the trRosetta neural network, generates distance and angle constraints between pairs of residues, and carries out energy-minimization to produce a structure. Note that this mover deletes and replaces the input structure. If a native structure is provided, the mover tags the output structure with the RMSD to native.
The trRosettaProtocol mover requires compilation with Tensorflow support. To compile with Tensorflow support:
Download the Tensorflow 1.15 precompiled libraries for your operating system from one of the following. (Note that GPU versions require CUDA drivers; see https://www.tensorflow.org/install/lang_c for more information.) Linux/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-linux-x86_64-1.15.0.tar.gz Linux/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz Windows/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-windows-x86_64-1.15.0.zip Windows/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-windows-x86_64-1.15.0.zip MacOS/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-darwin-x86_64-1.15.0.tar.gz MacOS/GPU: None available.
Unzip/untar the archive into a suitable directory (~/mydir/ is used here as an example), and add the following environment variables: Linux, Windows: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib MacOS: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib
Edit your user.settings file (Rosetta/main/source/tools/build/user.settings), and uncomment (i.e. remove the octothorp from the start of) the following lines: import os 'program_path' : os.environ['PATH'].split(':'), 'ENV' : os.environ,
Compile Rosetta, appending extras=tensorflow (for CPU-only) or extras=tensorflow_gpu (for GPU) to your scons command. For example: ./scons.py -j 8 mode=release extras=tensorflow bin
References and author information for the trRosettaProtocol mover:
trRosetta Neural Network's citation(s): Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, and Baker D. (2020). Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci USA 117(3):1496-503. doi: 10.1073/pnas.1914677117.
FastRelax Mover's citation(s): *Tyka MD, *Keedy DA, André I, Dimaio F, Song Y, Richardson DC, Richardson JS, and Baker D. (2011). Alternate states of proteins revealed by detailed energy landscape mapping. J Mol Biol 405(2):607-18. doi: 10.1016/j.jmb.2010.11.008. (*Co-primary authors.)
Khatib F, Cooper S, Tyka MD, Xu K, Makedon I, Popovic Z, Baker D, and Players F. (2011). Algorithm discovery by protein folding game players. Proc Natl Acad Sci USA 108(47):18949-53. doi: 10.1073/pnas.1115898108.
Maguire JB, Haddox HK, Strickland D, Halabiya SF, Coventry B, Griffin JR, Pulavarti SVSRK, Cummins M, Thieker DF, Klavins E, Szyperski T, DiMaio F, Baker D, and Kuhlman B. (2021). Perturbing the energy landscape for improved packing during computational protein design. Proteins 89(4):436-449. doi: 10.1002/prot.26030.
trRosettaProtocol Mover's author(s): Vikram K. Mulligan, Systems Biology, Center for Computational Biology, Flatiron Institute vmulligan@flatironinstitute.org
RMSDMetric SimpleMetric's author(s): Jared Adolf-Bryfogle, Scripps Research Institute [jadolfbr@gmail.com]
TotalEnergyMetric SimpleMetric's author(s): Jared Adolf-Bryfogle, Scripps Research Institute [jadolfbr@gmail.com]
TimingProfileMetric SimpleMetric's author(s): Jared Adolf-Bryfogle, Scripps Research Institute [jadolfbr@gmail.com]
<trRosettaProtocol name="(&string;)" msa_file="(&string;)"
write_constraints_to_file="(&string;)"
only_write_constraints="(false &bool;)"
use_distance_constraints="(true &bool;)"
use_omega_constraints="(true &bool;)"
use_theta_constraints="(true &bool;)"
use_phi_constraints="(true &bool;)"
distance_constraint_prob_cutoff="(0.05 ℜ)"
omega_constraint_prob_cutoff="(0.55 ℜ)"
theta_constraint_prob_cutoff="(0.55 ℜ)"
phi_constraint_prob_cutoff="(0.65 ℜ)"
distance_constraint_weight="(1.0 ℜ)"
omega_constraint_weight="(1.0 ℜ)"
theta_constraint_weight="(1.0 ℜ)"
phi_constraint_weight="(1.0 ℜ)" sequence="(&string;)"
fasta_file="(&string;)" backbone_randomization_mode="(classic &string;)"
backbone_minimization_mode="(classic2 &string;)"
cis_peptide_prob_non_prepro="(0.0005 ℜ)"
cis_peptide_prob_prepro="(0.05 ℜ)"
scorefxn0="(trRosetta_cen0 &string;)"
scorefxn1="(trRosetta_cen1 &string;)"
scorefxn2="(trRosetta_cen2 &string;)"
scorefxn3="(trRosetta_cart &string;)" mutate_gly_to_ala="(true &bool;)"
fullatom_refinement="(true &bool;)" scorefxn_fullatom="(&string;)" />
At the time of this writing, it is recommended to set mutate_gly_to_ala="false"
and backbone_randomization_mode="ramachandran"
. This may become the default at some point. All other settings may remain default.
The following script produces pretty good (~2 A RMSD) predictions of the structure of ubiquitin perhaps four times out of five:
<ROSETTASCRIPTS>
<SCOREFXNS>
<ScoreFunction name="r15" weights="ref2015" />
</SCOREFXNS>
<RESIDUE_SELECTORS>
</RESIDUE_SELECTORS>
<PACKER_PALETTES>
</PACKER_PALETTES>
<TASKOPERATIONS>
</TASKOPERATIONS>
<MOVE_MAP_FACTORIES>
</MOVE_MAP_FACTORIES>
<SIMPLE_METRICS>
</SIMPLE_METRICS>
<FILTERS>
</FILTERS>
<MOVERS>
<trRosettaProtocol name="predict_struct" msa_file="inputs/1r6j_msa.a3m"
sequence="GAMDPRTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPAF"
mutate_gly_to_ala="false" backbone_randomization_mode="ramachandran"
/>
</MOVERS>
<PROTOCOLS>
<Add mover="predict_struct" />
</PROTOCOLS>
<OUTPUT scorefxn="r15" />
</ROSETTASCRIPTS>
In this example, the input multiple sequence alignment (MSA), which was generated using the HHBlits sever (https://toolkit.tuebingen.mpg.de/tools/hhblits), looks like this:
>1718255
GAMDPRTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPAF
>UniRef100_A0A2 Putative syntenin-1 n=1 Tax=Stichopus japonicus TaxID=307972 RepID=A0A2G8KW37_STIJA
---FERTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPKF
>UniRef100_UPI0 Syntenin 1 n=2 Tax=Homo sapiens TaxID=9606 RepID=UPI00001B299E
KNMDQfqRTVTMHKDSSGHVGFVFKKGKIVSIAKDSSAARNGLLTHHCICEVNGQNVIGMKDKQITEVLSGSGNVVTITIMPAF
>UniRef100_A0A0 Uncharacterized protein (Fragment) n=1 Tax=Amblyomma triste TaxID=251400 RepID=A0A023GMK5_AMBTT
---FERTVTMHKDSTGHVGFVFKNGKITSLVKDSSAARNGLLTEHYLCEINGQNVIGLKDKQIKDILSTSGNVITITVMPSF
>UniRef100_A0A0 Syntenin-1 n=1 Tax=Fukomys damarensis TaxID=885580 RepID=A0A091E3S4_FUKDA
---FERTVTMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPAF
>UniRef100_A0A0 Uncharacterized protein n=1 Tax=Aedes albopictus TaxID=7160 RepID=A0A023ENS9_AEDAL
---FERTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTDHQICEVNGQNVIGLKDKQIADILSTAGNVVTITIMPSF
...
A typical MSA is dozens to hundreds of sequences (though even a single-sequence "alignment" can often produce meaningful predictions).
Please see the trRosetta application documentation for information about the trRosetta code organization.