The cs_rosetta_rna application refines (minimizes) and scores a RNA structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
E(hybrid) = E(Rosetta) + E(shift)
, where E(Rosetta)
is the standard Rosetta all-energy function for RNA [1], and E(shift)
is the chemical shift pseudo-energy term [2].
Input RNA PDB structures can be generated by FARFAR [1] and/or Stepwise Assembly [3] structure modeling methods, or can be an experimental NMR or crystallographic structure.
CS-ROSETTA-RNA can now also utilize ribose (C1', C2', C3', C4', C5') and base (C2, C5, C6, C5) carbon and exchangeable imino (N1, N3, H1, H3) chemical shifts. When present these, predictions are carried out using LARMORD [4].
The code for the cs_rosetta_rna application is in src/apps/public/farna/cs_rosetta_rna.cc
.
For 'minimal' demo examples, including input files, of the cs_rosetta_rna protocol, see: rosetta/demos/public/cs_rosetta_rna
.
1) rosetta_inputs/GA-AG_mismatch/1MIS_NMR.pdb
2) rosetta_inputs/GA-AG_mismatch/1MIS_exp_1H_chem_shifts.str
Experimental non-exchangeable 1H chemical shift data for the tandem GA:AG mismatch internal loop.
Each line in this file represent one chemical shift data point and contains the following nine space-delimited columns (based on the STARv2.1 format):
col 1: Atom_shift_assign_ID (INT)
col 2: Residue_author_seq_code (INT)
col 3: Residue_seq_code (INT)
col 4: Residue_label (STRING)
col 5: Atom_name (STRING)
col 6: Atom_type (STRING)
col 7: Chem_shift_value (FLOAT)
col 8: Chem_shift_value_error (STRING)
col 9: Chem_shift_ambiguity_code (STRING)
NOTE: the residue_seq_code (col 3), residue_label (col 4), and atom_name (col 5) should be consistent with the data in the PDB file. Also, col 8 and col 9 are currently not used internally by the cs_rosetta_rna application.
3) rosetta_inputs/GA-AG_mismatch/1MIS_params
Also, the input data files for all 23 RNA motifs benchmarked in ref. [2] are provided in the Supplemental Data Zip file, available here.
1) Refine (minimize) and score a PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
cs_rosetta_rna -mode minimize_pdb -pdb <input_pdb> -score:rna_chemical_shift_exp_data <exp_cs_data_file> -params_file <input_param_file> -analytic_etable_evaluation false
2) Score (but not refine) a PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
cs_rosetta_rna -mode score_pdb -pdb <input_pdb> -score:rna_chemical_shift_exp_data <exp_cs_data_file> -params_file <input_param_file> -analytic_etable_evaluation false
For example run command-lines, see "run_cmds.txt".
-score::rna_chemical_shift_H5_prime_mode
Specify how to handle assignment of the diastereotopic H5' and H5'' proton pair.
Valid modes:
LEAST_SQUARE_IGNORE_DUPLICATES
(default)
In this mode, the assignments of H5' and H5'' protons will be based on which values give better agreement between the experimental and back-calculated chemical shifts.
Use this mode, if the experimental non-exchangeable 1H chemical shift are not unambiguously assigned.
UNIQUE
In this mode, the assignments H5' and H5'' proton will be used "as is" and the cs_rosetta_rna will not attempt to switch the H5' and H5'' assignments.
Use this mode only if the the experimental non-exchangeable 1H chemical shift data have unambiguous assignments of the diastereotopic 1H5´ and 2H5´ protons. Note that this is uncommon.
-score::rna_chemical_shift_verbose
-score:rna_chemical_shift_larmord
Force all chemical shifts to be predicted using LARMORD, including proton chemical shifts.
NOTE: In this mode, user must also specify: -score:rna_chemical_shift_H5_prime_mode UNIQUE
-score:rna_chemical_shift_larmord_wt
File containing weights that determine the contribution of each nucleus type to the error type. Typical all the weight for a given nucleus = 1/expected_error
Format for weight file:
col 1: nucleus_type (e.g. C1') (STRING)
col 2: weight (FLOAT)
larmord_noweight.dat
default
all nuclei equally weighted
larmord_nuchemics_1.0_nocut_accuracy.dat
each nucleus is differential weighted (1/expected_error)
use when predicting non-exchangeable proton shifts with NUCHEMICS and other nuclei with LARMORD
larmord_1.0_nocut_accuracy.dat
each nucleus is differential weighted (1/expected_error)
use when predicting all chemical shifts with LARMORD
-score:rna_chemical_shift_larmord_par`
File containing the LARMORD parameters
larmord_1.0_nocut_parameters.dat
default
correspond to that published in [4]
Output of cs_rosetta_rna
includes:
1) A breakdown of the hybrid CS-ROSETTA-RNA all-atom energy terms, e.g:
------------------------------------------------------------
Scores Weight Raw Score Wghtd.Score
------------------------------------------------------------
fa_atr 0.230 -125.447 -28.853
fa_rep 0.120 8.314 0.998
fa_intra_rep 0.003 81.488 0.236
fa_intra_RNA_base_phos_atr 0.230 0.000 0.000
fa_intra_RNA_base_phos_rep 0.120 0.000 0.000
lk_nonpolar 0.320 2.123 0.679
lk_nonpolar_intra_RNA 0.320 3.768 1.206
fa_elec_rna_phos_phos 1.050 -0.074 -0.078
ch_bond 0.420 -30.523 -12.820
rna_torsion 2.900 2.721 7.892
rna_sugar_close 0.700 3.171 2.220
fa_stack 0.125 -199.844 -24.981
geom_sol_fast 0.620 56.483 35.020
geom_sol_fast_intra_RNA 0.620 1.978 1.226
hbond_sr_bb_sc 0.620 0.000 0.000
hbond_lr_bb_sc 2.400 0.000 0.000
hbond_sc 2.400 -20.116 -48.279
hbond_intra 2.400 0.000 0.000
atom_pair_constraint 1.000 0.000 0.000
angle_constraint 1.000 0.000 0.000
rna_bulge 0.450 0.000 0.000
rna_chem_shift 4.000 1.232 4.928
linear_chainbreak 5.000 0.009 0.047
-----------------------------------------------------------
Total weighted score: -60.558
2) The total hybrid CS-ROSETTA-RNA all-atom energy, e.g:
hybrid_CS-ROSETTA-RNA_all-atom energy: -60.5579
3) The chemical shift RMSD, e.g:
chem_shift_RMSD: 0.143299
4) The RNA PDB structure after refinement under the hybrid CS-ROSETTA-RNA all-atom energy function (if -mode minimize_pdb
).
<in_pdb_basename>_out
.Figure 1: Breakdown of the secondary structure of the tandem GA:AG mismatch internal loop:
1 6 1
5'-CGGACG-3' 5'-CG
Entire structure: ||**|| H1 helix: ||
3'-GCAGGC-5' 3'-GC
12 7 12
6
GA CG-3'
2x2 mismatch: ** H2 helix: ||
AG GC-5'
7
2 base-pairs should be included at each helical boundary (for rationale, see ref. [2]). For example, in the case of the tandem GA:AG mismatch internal loop, the structure consists of the a 2 base-pairs H1 helix, a 2x2 mismatch,and a 2 base-pairs H2 helix.
The chemical shift data of all non-exchangeable proton should be included in the chemical shift data file. The non-exchangeable protons consist of the H1', H2', H3', H4', H5' and H5'' ribose protons, and the H2, H5, H6 and H8 base protons. Data lines belonging to other atom types will be ignored.
The chemical shift data of all nucleotides EXCEPT those that are right at 5' and 3' edges should be included in the chemical shift data file. For example, in the case of the tandem GA:AG mismatch internal loop, the chemical shift data of all nucleotides except C1, G6, C7 and G12 should be included.
1) Add a OBLIGATE PAIR
line for each helical base-pair located right at the 5' and 3' edges of the structure.
In the case of the tandem GA:AG mismatch internal loop, the OBLIGATE PAIRS
are C1-G12
and G6-C7
:
OBLIGATE PAIR 1 12 H H A
OBLIGATE PAIR 6 7 H H A
Note that the cs_rosetta_rna app will refine (minimize) ALL nucleotides EXCEPT nucleotides that are specified as OBLIGATE PAIR
, which will be be kept static.
2) Add ALLOW_INSERT
lines to include all non-canonical loop nucleotides position:
In the case of the tandem GA:AG mismatch internal loop, the ALLOW_INSERT
nucleotide positions are G3
, A4
, G9
and A10
:
ALLOW_INSERT 3 4
ALLOW_INSERT 9 10
3) Add CUTPOINT_CLOSED
line to include the position intermediately 5' of the first non-canonical loop nucleotide position.
In the case of the tandem GA:AG mismatch internal loop, the first non-canonical loop nucleotide position is G3
. The CUTPOINT_CLOSED
position is the position intermediately 5' of G3
, which is G2
:
CUTPOINT_CLOSED 2
Note that if the CUTPOINT_CLOSED
line was not included in the parameter line, the cs_rosetta_rna app will still be able run by selecting a random loop position as the cutpoint_closed position. However, it is recommended that the CUTPOINT_CLOSED
line be explicitly included to prevents this random selection.
4) Add CUTPOINT_OPEN
line for all position intermediately 5' of chain-breaks.
In the case of the tandem GA:AG mismatch internal loop, there is chain-break between G6
and C7
. The CUTPOINT_OPEN
is the position intermediately 5' of the chain-break which is G6
:
CUTPOINT_OPEN 6
Adding all the above parameter lines together, we get the parameter file for the tandem GA:AG mismatch:
See rosetta_inputs/GA-AG_mismatch/1MIS_params
:
OBLIGATE PAIR 1 12 H H A
OBLIGATE PAIR 6 7 H H A
ALLOW_INSERT 3 4
ALLOW_INSERT 9 10
CUTPOINT_CLOSED 2
CUTPOINT_OPEN 6
Finally, the cs_rosetta_rna app can also run WITHOUT an input parameter file, although this is not recommended. For this case, a simple fold-tree with no chain-break/cutpoints will be used and all nucleotides will be refined (minimized).
Case 1:
If two chemical shift data points are measured for the diastereotopic H5' and H5'' protons pair and unambiguous assignment is possible,then include correct the unambiguous assignment in the data lines, e.g.:
1 1 1 G H5' H 4.180 . .
2 1 1 G H5'' H 4.540 . .
In this case, please explicitly include the command line option: -score:rna_chemical_shift_H5_prime_mode UNIQUE
Case 2:
If two chemical shift data points are measured for the diastereotopic H5' and H5'' protons pair BUT unambiguously assignment is not possible, then include either of the two possible assignments in the data lines, e.g.:
1 1 1 G H5' H 4.180 . .
2 1 1 G H5'' H 4.540 . .
OR
1 1 1 G H5' H 4.540 . .
2 1 1 G H5'' H 4.180 . .
The cs_rosetta_rna app with automatically select the assignments which leads to better agreement between the experimental and back-calculated chemical shift.
Case 3:
If only one chemical shift data point is measured for the diastereotopic H5' and H5'' proton pair AND unambiguous assignment is not possible, then please still include two chemical shift data lines (with same cs-value), one for each proton, e.g.:
1 1 1 G H5' H 4.180 . .
2 1 1 G H5'' H 4.180 . .
[1] Das, R., Karanicolas, J. & Baker, D. Nat Methods 7, 291-294 (2010).
[2] Sripakdeeving, P. et al. Nature Methods 11, 413–416 (2014).
[3] Sripakdeevong, P., Kladwang, W. & Das, R. Proc Natl Acad Sci U S A 108, 20573-20578 (2011).
[4] Frank, AT., Law, SM & Brooks III, CL J. Phys. Chem B 118, 12168-12175 (2014).