trRosettaProtocol mover

Back to Mover page. Documentation added 25 March 2021 by Vikram K. Mulligan, Flatiron Institute (vmulligan@flatironinstitute.org).

Table of Contents

trRosettaProtocol mover

The trRosettaProtocol mover provide the same functionality as the trRosetta application, but in the form of a mover that can be used in RosettaScripts or PyRosetta scripts, or in C++ code. Although most movers take a pose as input, manipulate it, and produce a pose as output, the trRosettaProtocol mover discards the input pose and builds a new one. The inputs are a sequence or FASTA file and a multiple sequence alignment; the latter is input into the trRosetta neural network to generate inter-residue distance and orientation constraints that guide structure prediction. Each run of the trRosettaProtocol mover will generate a new predicted structure. These structures tend to show a small amount of variation, so relatively low levels of sampling are necessary. On the other hand, this means that this protocol is not ideal for large-scale conformational sampling (e.g. to evaluate whether the energy landscape has alternative minima).

Compilation requirements

The trRosettaProtocol mover requires that Rosetta be compiled with Tensorflow support. See the autogenerated description below for details on how to compile Rosetta and link Tensorflow.

All options

Autogenerated Tag Syntax Documentation:

Implements the full trRosetta protocol, as described in Yang et al. (2020) Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 117(3):1496-503. https://doi.org/10.1073/pnas.1914677117. This mover takes as input a multiple sequence alignment, runs the trRosetta neural network, generates distance and angle constraints between pairs of residues, and carries out energy-minimization to produce a structure. Note that this mover deletes and replaces the input structure. If a native structure is provided, the mover tags the output structure with the RMSD to native.

The trRosettaProtocol mover requires compilation with Tensorflow support. To compile with Tensorflow support:

Download the Tensorflow 1.15 precompiled libraries for your operating system from one of the following. (Note that GPU versions require CUDA drivers; see https://www.tensorflow.org/install/lang_c for more information.) Linux/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-linux-x86_64-1.15.0.tar.gz Linux/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz Windows/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-windows-x86_64-1.15.0.zip Windows/GPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-windows-x86_64-1.15.0.zip MacOS/CPU: https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-darwin-x86_64-1.15.0.tar.gz MacOS/GPU: None available.
Unzip/untar the archive into a suitable directory (~/mydir/ is used here as an example), and add the following environment variables: Linux, Windows: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib MacOS: LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib
Edit your user.settings file (Rosetta/main/source/tools/build/user.settings), and uncomment (i.e. remove the octothorp from the start of) the following lines: import os 'program_path' : os.environ['PATH'].split(':'), 'ENV' : os.environ,
Compile Rosetta, appending extras=tensorflow (for CPU-only) or extras=tensorflow_gpu (for GPU) to your scons command. For example: ./scons.py -j 8 mode=release extras=tensorflow bin

References and author information for the trRosettaProtocol mover:

trRosetta Neural Network's citation(s): Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, and Baker D. (2020). Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci USA 117(3):1496-503. doi: 10.1073/pnas.1914677117.

FastRelax Mover's citation(s): *Tyka MD, *Keedy DA, André I, Dimaio F, Song Y, Richardson DC, Richardson JS, and Baker D. (2011). Alternate states of proteins revealed by detailed energy landscape mapping. J Mol Biol 405(2):607-18. doi: 10.1016/j.jmb.2010.11.008. (*Co-primary authors.)

Khatib F, Cooper S, Tyka MD, Xu K, Makedon I, Popovic Z, Baker D, and Players F. (2011). Algorithm discovery by protein folding game players. Proc Natl Acad Sci USA 108(47):18949-53. doi: 10.1073/pnas.1115898108.

Maguire JB, Haddox HK, Strickland D, Halabiya SF, Coventry B, Griffin JR, Pulavarti SVSRK, Cummins M, Thieker DF, Klavins E, Szyperski T, DiMaio F, Baker D, and Kuhlman B. (2021). Perturbing the energy landscape for improved packing during computational protein design. Proteins 89(4):436-449. doi: 10.1002/prot.26030.

trRosettaProtocol Mover's author(s): Vikram K. Mulligan, Systems Biology, Center for Computational Biology, Flatiron Institute vmulligan@flatironinstitute.org

RMSDMetric SimpleMetric's author(s): Jared Adolf-Bryfogle, Scripps Research Institute [jadolfbr@gmail.com]

TotalEnergyMetric SimpleMetric's author(s): Jared Adolf-Bryfogle, Scripps Research Institute [jadolfbr@gmail.com]

TimingProfileMetric SimpleMetric's author(s): Jared Adolf-Bryfogle, Scripps Research Institute [jadolfbr@gmail.com]

<trRosettaProtocol name="(&string;)" msa_file="(&string;)"
        write_constraints_to_file="(&string;)"
        only_write_constraints="(false &bool;)"
        use_distance_constraints="(true &bool;)"
        use_omega_constraints="(true &bool;)"
        use_theta_constraints="(true &bool;)"
        use_phi_constraints="(true &bool;)"
        distance_constraint_prob_cutoff="(0.05 &real;)"
        omega_constraint_prob_cutoff="(0.55 &real;)"
        theta_constraint_prob_cutoff="(0.55 &real;)"
        phi_constraint_prob_cutoff="(0.65 &real;)"
        distance_constraint_weight="(1.0 &real;)"
        omega_constraint_weight="(1.0 &real;)"
        theta_constraint_weight="(1.0 &real;)"
        phi_constraint_weight="(1.0 &real;)" sequence="(&string;)"
        fasta_file="(&string;)" backbone_randomization_mode="(classic &string;)"
        backbone_minimization_mode="(classic2 &string;)"
        cis_peptide_prob_non_prepro="(0.0005 &real;)"
        cis_peptide_prob_prepro="(0.05 &real;)"
        scorefxn0="(trRosetta_cen0 &string;)"
        scorefxn1="(trRosetta_cen1 &string;)"
        scorefxn2="(trRosetta_cen2 &string;)"
        scorefxn3="(trRosetta_cart &string;)" mutate_gly_to_ala="(true &bool;)"
        fullatom_refinement="(true &bool;)" scorefxn_fullatom="(&string;)" />

msa_file: Filename for a multiple sequence alignment file, in a3m format. Dashes indicate gap sequences, and lowercase characters will be removed (and flanking regions ligated). If not provided, the commandline option -trRosetta:msa_file will be used. One or the other is required.
write_constraints_to_file: A file to which trRosetta constraints will be written. Ordinarily, these are not written to disk, but this option permits this. Note that this triggers direct disk writes by this mover. This can be dangerous in a multi-process or multi-threaded context, or in a large production environment. Not intended for nstruct greater than 1. If the filename is left as an empty string, no disk write occurs. Empty by default, unless set otherwise in command-line options.
only_write_constraints: If set to true, this mover ONLY generates trRosetta constraints and writes them to disk. That is, this option allows the actual structure prediction steps to be skipped. If used, the 'write_constraints_to_file' option must be set. False by default, unless set otherwise in command-line options.
use_distance_constraints: Set whether inter-residue distance constraints generated by the trRosetta neural network should be used for structure prediction. True by default, unless a default is set at the commandline with the -trRosetta:use_distance_constraints flag.
use_omega_constraints: Set whether inter-residue omega dihedral constraints generated by the trRosetta neural network should be used for structure prediction. Note that this is NOT the omega backbone dihedral angle, but an inter-residue dihedral defined by CA1-CB1-CB2-CA2. True by default, unless a default is set at the commandline with the -trRosetta:use_omega_constraints flag.
use_theta_constraints: Set whether inter-residue theta dihedral constraints generated by the trRosetta neural network should be used for structure prediction. Note that this is NOT a backbone dihedral angle, but an inter-residue dihedral defined by N1-CA1-CB1-CB2. True by default, unless a default is set at the commandline with the -trRosetta:use_theta_constraints flag.
use_phi_constraints: Set whether inter-residue phi angle constraints generated by the trRosetta neural network should be used for structure prediction. Note that this is NOT the phi backbone dihedral angle, but an inter-residue angle defined by CA1-CB1-CB2. True by default, unless a default is set at the commandline with the -trRosetta:use_phi_constraints flag.
distance_constraint_prob_cutoff: Set the probability cutoff below which we omit a distance constraint. Default 0.05, or whatever is set on the commandline with the -trRosetta::distance_constraint_prob_cutoff commandline option.
omega_constraint_prob_cutoff: Set the probability cutoff below which we omit a omega dihedral constraint. Default 0.55, or whatever is set on the commandline with the -trRosetta::omega_constraint_prob_cutoff commandline option.
theta_constraint_prob_cutoff: Set the probability cutoff below which we omit a theta dihedral constraint. Default 0.55, or whatever is set on the commandline with the -trRosetta::theta_constraint_prob_cutoff commandline option.
phi_constraint_prob_cutoff: Set the probability cutoff below which we omit a phi angle constraint. Default 0.65, or whatever is set on the commandline with the -trRosetta::phi_constraint_prob_cutoff commandline option.
distance_constraint_weight: Set the weight for trRosetta-generated distance constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:distance_constraint_weight commandline option.
omega_constraint_weight: Set the weight for trRosetta-generated omega dihedral constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:omega_constraint_weight commandline option.
theta_constraint_weight: Set the weight for trRosetta-generated theta dihedral constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:theta_constraint_weight commandline option.
phi_constraint_weight: Set the weight for trRosetta-generated phi angle constraints. Defaults to 1.0, or whatever was set on the commandline with the -trRosetta:phi_constraint_weight commandline option.
sequence: The amino acid sequence to predict. EITHER this OR a FASTA file must be provided. Sequences must be single-letter amino acid codes, and must contain only the 20 canonical amino acids.
fasta_file: A FASTA file containing a single sequence, the amino acid sequence to predict. EITHER this OR a sequence must be provided. Sequences must be single-letter amino acid codes, and must contain only the 20 canonical amino acids. A FASTA file can also be set with the -in:file:fasta commandline flag, which sets the default for this mover (overrideable either with the fasta_file option or the sequence option).
backbone_randomization_mode: The manner in wihch the polypeptide backbone will be initially randomized. Options are 'classic' (the manner used in the original Yang et al. PyRosetta protocol, which randomly selects from one of six phi/psi pairs for each residue), 'ramachandran' (randomizing biased by the Ramachandran preferences of each amino acid type), or 'bins' (randomizing biased by the probabilities of residue type i being in backbone bin X and residue type i+1 being in backbone bin Y). Defaults to 'classic', or whatever is set at the commandline with the -trRosetta::backbone_randomization_mode commandline option.
backbone_minimization_mode: The manner in wihch the polypeptide backbone will be minimized using the constraints from the trRosetta neural network. Options are: 'classic0' (minimize using short-range constraints, then minimize using medium-range constraints, then minimize using long-range constraints), 'classic1' (minimize using short- and medium-range constraints, then minimize using long-range constraints), or 'classic2' (minimize using all constraints). Defaults to 'classic2', or whatever is set at the commandline with the -trRosetta::backbone_minimization_mode commandline option.
cis_peptide_prob_non_prepro: The probability of sampling a cis peptide bond at a position that is NOT followed by a proline when 'ramachandran' backbone randomization mode is used. Defaults to 0.0005 (or a setting provided at the commandline with the -trRosetta:cis_peptide_prob_non_prepro flag). Ignored for 'classic' or 'bins' modes.
cis_peptide_prob_prepro: The probability of sampling a cis peptide bond at a position that IS followed by a proline when 'ramachandran' backbone randomization mode is used. Defaults to 0.05 (or a setting provided at the commandline with the -trRosetta:cis_peptide_prob_prepro flag). Ignored for 'classic' or 'bins' modes.
scorefxn0: The scoring function used for stage 0 energy minimization. Defaults to trRosetta_cen0 (or to whatever is set on the commandline with the -trRosetta:scorefxn0 commandline option).
scorefxn1: The scoring function used for stage 1 energy minimization. Defaults to trRosetta_cen1 (or to whatever is set on the commandline with the -trRosetta:scorefxn1 commandline option).
scorefxn2: The scoring function used for stage 2 (Van der Waals) energy minimization. Defaults to trRosetta_cen2 (or to whatever is set on the commandline with the -trRosetta:scorefxn2 commandline option).
scorefxn3: The scoring function used for stage 3 (Cartesian) energy minimization. Defaults to trRosetta_cart (or to whatever is set on the commandline with the -trRosetta:scorefxn3 commandline option).
mutate_gly_to_ala: If true, glycine residues are mutated to alanine during the initial centroid phases of minimization to match the original PyRosetta trRosetta protocol (then mutated back to glycine for fullatom refinement). True by default.
fullatom_refinement: If true, we do fullatom refinement at the end with the FastRelax protocol, using the scoring function specified with the scorefxn_fullatom option. If the atom_pair, dihedral, and angle constraint scoreterms are not on, they are turned on. True by default.
scorefxn_fullatom: Weights file for scorefunction used for fullatom refinement with FastRelax. If atom-pair_constraint, dihedral_constriant, or angle_constraint terms are zero, they will be set to 5.0, 1.0, and 1.0 respectively. If empty (the default), then the scoring function specified with -score:weights is used instead.

Best practices

At the time of this writing, it is recommended to set mutate_gly_to_ala="false" and backbone_randomization_mode="ramachandran". This may become the default at some point. All other settings may remain default.

Example script

The following script produces pretty good (~2 A RMSD) predictions of the structure of ubiquitin perhaps four times out of five:

<ROSETTASCRIPTS>
	<SCOREFXNS>
		<ScoreFunction name="r15" weights="ref2015" />
	</SCOREFXNS>
	<RESIDUE_SELECTORS>
	</RESIDUE_SELECTORS>
	<PACKER_PALETTES>
	</PACKER_PALETTES>
	<TASKOPERATIONS>
	</TASKOPERATIONS>
	<MOVE_MAP_FACTORIES>
	</MOVE_MAP_FACTORIES>
	<SIMPLE_METRICS>
	</SIMPLE_METRICS>
	<FILTERS>
	</FILTERS>
	<MOVERS>
		<trRosettaProtocol name="predict_struct" msa_file="inputs/1r6j_msa.a3m"
			sequence="GAMDPRTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPAF"
			mutate_gly_to_ala="false" backbone_randomization_mode="ramachandran"
		/>
	</MOVERS>
	<PROTOCOLS>
		<Add mover="predict_struct" />
	</PROTOCOLS>
	<OUTPUT scorefxn="r15" />
</ROSETTASCRIPTS>

In this example, the input multiple sequence alignment (MSA), which was generated using the HHBlits sever (https://toolkit.tuebingen.mpg.de/tools/hhblits), looks like this:

>1718255
GAMDPRTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPAF
>UniRef100_A0A2 Putative syntenin-1 n=1 Tax=Stichopus japonicus TaxID=307972 RepID=A0A2G8KW37_STIJA
---FERTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPKF
>UniRef100_UPI0 Syntenin 1 n=2 Tax=Homo sapiens TaxID=9606 RepID=UPI00001B299E
KNMDQfqRTVTMHKDSSGHVGFVFKKGKIVSIAKDSSAARNGLLTHHCICEVNGQNVIGMKDKQITEVLSGSGNVVTITIMPAF
>UniRef100_A0A0 Uncharacterized protein (Fragment) n=1 Tax=Amblyomma triste TaxID=251400 RepID=A0A023GMK5_AMBTT
---FERTVTMHKDSTGHVGFVFKNGKITSLVKDSSAARNGLLTEHYLCEINGQNVIGLKDKQIKDILSTSGNVITITVMPSF
>UniRef100_A0A0 Syntenin-1 n=1 Tax=Fukomys damarensis TaxID=885580 RepID=A0A091E3S4_FUKDA
---FERTVTMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTEHNICEINGQNVIGLKDSQIADILSTSGTVVTITIMPAF
>UniRef100_A0A0 Uncharacterized protein n=1 Tax=Aedes albopictus TaxID=7160 RepID=A0A023ENS9_AEDAL
---FERTITMHKDSTGHVGFIFKNGKITSIVKDSSAARNGLLTDHQICEVNGQNVIGLKDKQIADILSTAGNVVTITIMPSF
...

A typical MSA is dozens to hundreds of sequences (though even a single-sequence "alignment" can often produce meaningful predictions).

Code organization

Please see the trRosetta application documentation for information about the trRosetta code organization.

References

The trRosetta neural network is described in Yang et al. (2020) Proc Natl Acad Sci USA 117(3):1496-1503 (doi 10.1073/pnas.1914677117).
The trRosettaProtocol mover, trRosettaConstraintGenerator, trRosetta application, and other C++ infrastructure were written by Vikram K. Mulligan (vmulligan@flatironinstitute.org), and are currently unpublished.

trRosettaProtocol mover