An introductory tutorial on ab initio can be found here. A tutorial on protein folding using the broker can be found here
The fragment assembly proceeds in 4 stages that differ mainly by the ScoreFunction and the fragment size applied during the trial moves. The third stage is split into two sub-stages stage IIIa and IIIb. The weight-sets to create the ScoreFunctions are stored in the Rosetta database under the names score0
, score1
, score2
, score5
and score3
, for stage 1, 2, 3a, 3b and 4, respectively. To manipulate the scoring during assembly stages one has to provide a score patch using the flags -abinitio:stageX_patch file.wts_patch
, where X = 1, 2, 3a, 3b or 4, respectively and file.wts_patch is a file containing the patch. There is no single flag to patch the ScoreFunctions of all stages at once.
The fragment assembly protocol generally runs a block of fragment trial moves (standard block size is 2000 cycles) and then returns the conformation that during the cycles gave the lowest energy. Stage I and II run 1 cycle block each of 2000 cycles. In stage III 10 cycle-blocks of 2000 cycles are run interleaving in IIIa and IIIb modus. In stage IV 3 blocks of 4000 cycles are run. The number of cycles can be scaled up globally using flag -increase_cycles X
such that the new cycle numbers are 2000*X and 4000*X, respectively.
The trial moves will be large fragments in Stages I, II, III and small fragments in stage IV and are handled by the ClassicFragmentMover. In blocks 2 and 3 of stage IV a SmoothFragmentMover is used that prefers fragment moves with little downstream perturbation. The amount of downstream perturbation is quantified using the GunnCost metric.
The flag -abinitio:skip_stages [1,2,3,4]
allows to specify a list of stages that are skipped during the fragment assembly process. The standard behavior is to recover the lowest scoring pose after each cycle block. This can be behavior can be changed by explicitly enumerating the stages where a recovery should happen with flag -abinitio:recover_low_instages [1,2,3,4]
.
Some changes and additions to the basic algorithm are made to improve the performance in the presence of constraints.
Distance restraints and chainbreak penalties (jumping) are switched on according to their sequence separation, i.e., the minimum number of residues one has to traverse in the FoldTree to get from one residue to the other. The first cycling block in Stage I is started with a maximum sequence separation of 3 residues. Subsequently, this threshold is increased in steps of two residues until a maximum sequence separation of 15% (-fold_cst::seq_sep_stages
) of the total length of the protein is reached. After each increment a StageI cycle block is sampled. This is a reason why the constraint algorithm is slower for large proteins (many increments until 15% is reached... ).
In subsequent stages, the sequence separation threshold is slowly ramped up to 100% (-fold_cst::seq_sep_stages
) but no additional sampling cycles are used.
If beta-strand jumping is used the chainbreak penalties are treated similar to distance restraints such that they turned on/off according to the sequence separation threshold, additionally the chainbreak-weights in the ScoreFunction
can be ramped up (-jumping:ramp_chainbreaks
).
An application that uses the abinitio protocol and is, for example, started in the directory rosetta_demos/abinitio/
would read-out the following flags:
-in:file:native ./input_files/1elw.pdb Native structure (optional)
-in:file:fasta ./input_files/1elwA.fasta Protein sequence in fasta format (required)
-in:file:frag3 ./input_files/aa1elwA03_05.200_v1_3 Fragment library: 3-residue fragments (required)
-in:file:frag9 ./input_files/aa1elwA09_05.200_v1_3 Fragment library: 9-residue fragments (required)
-database path/to/rosetta/main/database Path to rosetta database (required if not ROSETTA3_DB environment variable is set)
-nstruct 1 Number of output structures (default=1).
-out:file:silent 1elwA_silent.out Use silent file output, use filename after this flag (default=default.out).
(or -out:pdb) Use PDB file output (default=false).
-out:path /my/path Path where PDB output files will be written to (default='.').
These flags are implemented in FragmentSampler.cc
There are several optional settings. Those which have been benchmarked and tested thoroughly for optimal performance carry the comment "recommended":
-abinitio::increase_cycles 10 Increase the number of cycles at each stage in ab initio by this factor (recommended).
-abinitio:rg_reweight 0.5 Reweight contribution of radius of gyration to total score by this scale factor (recommended).
-abinitio:rsd_wt_helix 0.5 Reweight env,pair,cb for helix residues by this factor (recommended).
-abinitio:rsd_wt_loop 0.5 Reweight env,pair,cb for loop residues by this factor (recommended).
-abinitio:stage1_patch Supply patch file for the score0 ScoreFunction used in Stage1
-abinitio:stage2_patch Supply patch file for the score1 ScoreFunction used in Stage2
-abinitio:stage3a_patch Supply patch file for the score2 ScoreFunction used in Stage3
-abinitio:stage3b_patch Supply patch file for the score5 ScoreFunction used in Stage3
-abinitio:stage4_patch Supply patch file for the score3 ScoreFunction used in Stage4
-abinitio:skip_stages 1 2 3 4 list all stages (1-4) that should be skipped (default=None)
-abinitio:recover_low_in_stages 1 2 3 4 after the sampling cycles of a given block are finished the
lowest scoring pose recorded during the sampling is recovered,
use this flag to switch this off for individual stages
These flags are implemented in ConstraintFragmentSampler.cc
-constraints:cst_weight 1.0 Patches the weight for the ScoreType atom_pair_constraint in
all scores (score0, score1, score2, score5 and score3) used
for abinitio. This change is applied after patching via
-stageX_patch.
-fold_cst:seq_sep_stages f1 f2 f3 Restraints and chainbreak-penalties are considered based on
their sequence separation. The maximum sequence separation an
active restraint can have is slowly ramped up. The target values
for the ramping can be controlled for the end-points of stage
2, 3 and 4, and are given as fraction of total length of
protein (default=0.15, 1.0, 1.2).
-fold_cst:skip_on_noviolation_in_stage1 In Stage1 the maximum restraint sequence separation is incremented
in steps of 2 residues. After each increment a block of sampling
cycles is performed. This option allows to terminate (or skip)
sampling no restraint is violated at a given time.
-constraint:threshold control the threshold above which a constraint is violated (default=1).
-fold_cst:no_recover_low_at_constraint_switch When new constraints are turned (seq_sep_stages) the MonteCarlo object
has to be reset and the previously found lowest energy pose will be lost.
To avoid loosing this information, we usually carry out a recover_low
before switching on new constraints. Use this flag to avoid this extra
recover_low.
-jumps:ramp_chainbreaks smoothly ramp up the weights for ScoreTypes linear_chainbreak and
overlap_chainbreak (recommended, default=true).
-jumps:increase_chainbreak factor for the ramping of chainbreaks (default=1.0).
-jumps:overlap_chainbreak use the overlap chainbreak in stage4 (recommended, default=false).
-jumps:chainbreak_weight_stage1 Set the weight for ScoreType linear_chainbreak for Stage1.
Irrelevant if ramping of chainbreaks is active.
-jumps:chainbreak_weight_stage2 Set the weight for ScoreType linear_chainbreak for Stage2.
Irrelevant if ramping of chainbreaks is active.
-jumps:chainbreak_weight_stage3 Set the weight for ScoreType linear_chainbreak for Stage3.
Irrelevant if ramping of chainbreaks is active.
-jumps:chainbreak_weight_stage4 Set the weight for ScoreType linear_chainbreak for Stage4.
Irrelevant if ramping of chainbreaks is active.
For running multiple jobs on a cluster the following options are useful:
-constant_seed Use a constant seed (1111111 unless specified with -jran)
-jran 1234567 Specify seed. Should be unique among jobs (requires -constant_seed)
-seed_offset 10 This value will be added to the random number seed. Useful when using
time as seed and submitting many jobs to a cluster. If jobs are started
in the same second they will still have different initial seeds when using
a unique offset. If using Condor (http://www.cs.wisc.edu/condor), the Condor
process id, $(Process), can be used for this.
For example "-seed_offset $(Process)" can be used in the condor submit file.
-run:dry_run no sampling cycles, just initialize all objects (for testing, default=False)
-run:test_cycles only a single sampling cycle per stage (for testing, default=False)
The top-level mover that encodes the abrelax protocol is called AbrelaxMover and resides in main/source/src/protocols/abinitio
.
This top-level mover calls the Movers for the fragment assembly protocol (abinitio), and relax protocol. If beta-strand jumping is used non-natural chainbreaks have to be closed, and the AbrelaxMover will the protocols for loop-closing and idealize before the relax stage.
The fragment assembly protocol is encoded in the Movers FragmentSampler and its derived class ConstraintFragmentSampler which reside in main/source/src/protocols/abinitio
.
The AbrelaxMover works closely together with the TopologyBroker module that resides in main/source/src/protocols/topology_broker
. The TopologyBroker module handles the introduction of restraints, and then determines the kinematics of beta-strands through its ConstraintClaimer and TemplateJumpClaimer classes, respectively.