RNA de novo modeling was originally written in Rosetta by R. Das in late 2006, back when the code was in rosetta++
. It gradually accreted additional functionalities, such as post-fragment-modeling minimization (fragment assembly of RNA with full-atom minimization, FARFAR). A bit of the code was put into object-oriented form in 2010-2011 with the migration to Rosetta 3 ('minirosetta') and tests of a coarse_rna
modeling scheme (never published). But the code was due for a reorganization as additional functions started to be introduced or envisioned – e.g., chemical shift scoring, general 'chunk' fragments (not contiguous in sequence), modeling or refinement of sub-poses of larger poses – and as better practices for Rosetta coding were established.
In 2015-2016, the Das lab seeks to test an integration of stepwise modeling with FARNA/FARFAR. This has required encapsulation of FARNA functionality into a single class and has presented an opportunity to organize sub-functionalities into different sub-namespaces/sub-directories.
This document outlines the current organization for developers, including notes on anticipated additions including handling of RNA/protein interfaces & job specification using the much cleaner flags & setup functions developed for stepwise
(see To Do at bottom of this document).
One semantic note: The terms FARNA and FARFAR are now preferred throughout the code to refer to the actual fragment assembly Monte Carlo protocol, which can now be accessed through rna_denovo
but also through stepwise
or, in principle, anywhere else in the code, e.g. for refinement of RNA pieces of poses. The term RNA_DeNovo or rna_denovo are retained in a few places to signify old wrapper code specific to the rna_denovo.cc
application with .params
input for the more specific use case of modeling RNA from extended chain.
protocols::farna
RNA_DeNovoProtocol
and FARNA_Optimizer
RNA_DeNovoProtocol
is the 'classic' wrapper, itself setup by the rna_denovo.cc
application. It loops over the nstruct
poses that need to be built and kicks off instances of RNA_FragmentMonteCarlo, and then outputs those poses to silent files. It is also in charge of the input from .params
files (hopefully to be deprecated soon). Note: this could/should be handled by a JobDistributor, and should migrate to JD3
when that's ready.
FARNA_Optimizer
is a more bare-bones wrapper that doesn't do any fancy silent file input/output. It was coded up for use within stepwise
runs, testing the -lores
flag where 100 cycles of FARNA are carried out. Briefly described on the stepwise page. Its the reasonable object to use to call FARNA within more complex workflows, including motif-by-motif refinement of big RNA poses (which doesn't exist yet).
RNA_FragmentMonteCarlo
RNA_FragmentMonteCarlo
is the main protocol setting up all libraries:
There is one important/tricky concept in RNA_FragmentMonteCarlo, an object called atom_level_domain_map
, which is a protocols::toolbox::AtomLevelDomainMap
. This holds for each atom in the pose an assignment of where that residue came from, and demarcates where fragments/jumps/chunks can be inserted at atom level.
The convention is as follows:
+ 0 marks totally free atoms.
+ 1,2,...998 marks atoms that came from fixed input domains (e.g, PDBs), with a different index for each PDB.
+ 999 is special, marking absolutely fixed atoms that did not come from an input domain (e.g., virtual phosphates that don't need to get moved during FARNA)
+ 1000 is special (ROSETTA_LIBRARY_DOMAIN) and marks atoms that are covered by a BasePairStepLibrary
protocols::farna::movers
namespaceRNA_FragmentMover
holds a RNA_Fragments
library and actually makes fragment moves on a pose
.
Note: This is where we could put in functionality to, e.g., choose fragments of longer length if they have sequence-matches to the desired pose. This is also where we could put chemical-shift-based weighting of fragment choices. Would need to define weights for each possible fragment, and update random choice to reflect those weights, but that should be easy (there's some code in stepwise::monte_carlo::mover::StepWiseMoveSelector
that we could share.)
RNA_JumpMover
holds an RNA_JumpLibrary
and actually makes jump moves on a pose
RNA_Minimizer
carries out 2'-OH packing and full-atom minimizing in two rounds. The first round prevents 'blow up' of FARNA conformations from clashes by coordinate constraints.
RNA_LoopCloser
looks over a pose
and applies CCD (cyclic coordinate descent) loop closure to any segments with chainbreaks and CUTPOINT_UPPER/CUTPOINT_LOWER variants (as specified by 'cutpoints_closed', or created during setup)
RNA_Relaxer
is not really in use. Use at your own risk.
protocols::farna::fragments
namespaceRNA_Fragments
is a base class for reading fragments from a database on disk and storing torsions.
FullAtomRNA_Fragments
is specific to Rosetta poses with fa_standard
-type RNA residues.
It generates culled lists of, e.g., 1-residue, 2-residue fragments, for particular sequences 'just-in-time'. (Note that while the object is kept const during the run, the just-in-time info is stored in a mutable map -- kind-of a standard hack.)
Default library is database/sampling/rna/RICHARDSON_RNA09.torsions
, but should be updated to NR2015
from Motif Atlas.
Note: This class could be extended to hold LarmorD-predicted atom-level chemical shifts and nucleotide-level chemical reactivities (to DMS, SHAPE, etc.) for each database RNA structure.
These classes were developed separately from protein fragment classes whose definitions were in flux at the time of coding the RNA protocols; it might be worth unifying the two, although its hard to imagine use cases that demand it.
There is also a (largely deprecated) class called protocols::coarse_rna::CoarseRNA_Fragments
that inherits from RNA_fragments
for poses that use coarse-grained RNA residue types with 3 dummy chains on the backbone and 3 in the base.
Hey, maybe this should be a sub-namespace of libraries...
protocols::farna::libraries
namespaceRNA_LibraryManager
manages singletons of each library read from disk. Use it to read relevant libraries once.
In addition to holding the RNA_Fragments
, its got RNA_JumpLibrary
and BasePairStepLibrary
, for now.
RNA_JumpLibrary
holds jumps read from the Rosetta database of jumps. By default, only drawn from the 1jj2 ribosome for now? database/sampling/rna/1jj2_RNA_jump_library.dat
, but this should minimally be updated to an RNA11 database (which exists in the database as RNA11_full.jump
), or even to NR2015.
Note: should be general to RNA/protein too, but those jumps haven't been implemented.
BasePairStepLibrary
holds coordinates of base pair steps (see BasePairStep) read from databases on disk. It actually just registers which files are on disk and then reads in the silent files 'just in time' during the run. Example file: database/sampling/rna/base_pair_steps/general/bulge_1nt/ag_unu.out.gz
hold coordinates of 4 residues of two base pairs from a base pair step in which one strand has sequence ag
and the other has sequence unu
(n
means any nucleotide).
RNA_ChunkLibrary
is an important object in RNA_FragmentMonteCarlo
that holds base pair steps and any coordinates from user-input PDBs or silent files. It also is responsible for creating the AtomLevelDomainMap
(shared by numerous movers above).
ChunkSet
Object that holds coordinates of input PDBs or base pair steps, in compact MiniPose
format.
Note: should be general to combined RNA/Protein chunks, but hasn't been tested.
protocols::farna::options
namespaceOptions are like a Russian doll of nested classes:
RNA_BasicOptions
has the most basic options, e.g., dump_pdb
RNA_MinimizerOptions
inherits from RNA_BasicOptions
and adds minimizer-specific options.
RNA_FragmentMonteCarloOptions
inherits from RNA_MinimizerOptions
and adds a bunch of fragment modeling options, including number of cycles monte_carlo_cycles
, whether to minimize or not minimize_structure
, filters like filter_lores_base_pairs
, and names of input PDB files.
RNA_DeNovoProtocolOptions
inherits from RNA_FragmentMonteCarloOptions
and adds a few i/o options, including number of models nstruct
.
protocols::farna::base_pairs
namespaceThis directory is meant to handle information on where base pairs and base pair steps are located in the pose for FARNA runs.
RNA_BasePairHandler
handles locations in pose of base pairs, base_pair_steps, any chain_connections (two segments that are supposed to have a pair between them somewhere). Also handles setup_base_pair_constraints()
.
Note: Would be great to have a proper RNA secondary structure information object specifiable by user and stored in the pose (see To Do below) -- at that point, this class would handle the conversion from that class into parameters used by FARNA.
BasePairStep.cc
holds information on an object like this:
5'-- ( i ) -- ( i+1) -- 3'
| |
| |
3'-- (j+q+1) ( j ) -- 5'
\ /
n - n - n
allowing bulges (n's) on the second strand
One side has to involve two contiguous nucleotides.
The other side involves nucleotides with a maximum separation of 3 nucleotides.
The pairs that bracket the step do not have to be Watson-Crick.
I was testing whether storage of these steps drawn from the crystallographic database
would allow for rapid recognition of motifs (it does, but you have to know which
residues are paired).
Note: not unified with core::rna::BasePair
, i.e. no storage of Watson-Crick edges or orientations. Keeping track of that information leads to very sparse databases.protocols::farna::secstruct
namespaceRNA_SecStruct
is meant to be a general class that stores actual pairings and can handle input/output of dot-paren notation like (((....)))
for a hairpin. Its still a bit crude, in that its primary datum is a string and not a list of pairs, which would be more fundamental. It also cannot input/output arbitrary #'s of pseudoknots, just three layers as dictated by (
,)
; [
,]
; and {
,}
. See also rna-secondary-structure-file.
RNA_SecStructLegacyInfo
currently holds 1D information on secondary structure
This class was derived to match Rosetta protein modeling (which holds 1D information on alpha-helix,beta-sheet, or loop).
protocols::farna::setup
namespaceRNA_DeNovoSetup
allows FARNA to now bypass the legacy .params
-file based input and instead directly take from command-line -sequence
, -secstruct
, -obligate_pair
. (see RNA DeNovo docs)RNA_DeNovoPoseInitializer
holds code for taking .params
information and creating a fold-tree and cutpoints for a new pose. (may become deprecated if we use build_full_model from stepwise to initialize the pose after RNA_DeNovoSetup
). RNA_DeNovoParameters
is responsible for reading in .params
files (now legacy code).rna_denovo_setup.py
into the RNA_DeNovoSetup class, which handles residue mapping to subproblems. Now just need to run build_full_model
to generate initial poses. A good time to test this might be when Kalli generalizes FARNA to include RNA-protein lo-res potential. Rebuilding an RNA pair within the MS2 test case is a good example.