The scripts and input files that accompany this demo can be found in the
demos/public
directory of the Rosetta weekly releases.
KEYWORDS: STRUCTURE_PREDICTION NUCLEIC_ACIDS RNA
The cs_rosetta_rna application refines (minimizes) and scores a RNA structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
E(hybrid) = E(Rosetta) + E(shift)
where E(Rosetta) is the standard Rosetta all-energy function for RNA [1], and E(Shift) is the chemical shift pseudo-energy term [2]. Input RNA PDB structures can be generated by FARFAR [1] and/or Stepwise Assembly [3] structure modeling methods, or can be an experimental NMR or crystallographic structure.
rosetta_inputs/GA-AG_mismatch/1MIS_NMR.pdb
NMR PDB structure of a tandem GA:AG mismatch internal loop.
rosetta_inputs/GA-AG_mismatch/1MIS_exp_1H_chem_shifts.str
Experimental non-exchangeable 1H chemical shift data for the
tandem GA:AG mismatch interna loop.
Each line in this file represent one chemical shift data point and contains the following nine space-delimited columns (based on the STAR v2.1 format):
col 1: Atom_shift_assign_ID (INT)
col 2: Residue_author_seq_code (INT)
col 3: Residue_seq_code (INT)
col 4: Residue_label (STRING)
col 5: Atom_name (STRING)
col 6: Atom_type (STRING)
col 7: Chem_shift_value (FLOAT)
col 8: Chem_shift_value_error (STRING)
col 9: Chem_shift_ambiguity_code (STRING)
Note that residue_seq_code (col 3), residue_label (col 4), and atom_name (col 5) should be consistent with the data in the PDB file. Also, col 8 and col 9 are currently not used internally by the cs_rosetta_rna application.
rosetta_inputs/GA-AG_mismatch/1MIS_params
Parameters file (in FARNA format) for the tandem GA:AG mismatch.
Also, the input data files for all 23 RNA motifs benchmarked in ref. [2] are provided in the Supplemental Data Zip file, available at the following URI: http://dx.doi.org/10.1038/nmeth.2876
Refine (minimize) and score a PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
<path-to-rosetta-bin>/cs_rosetta_rna.<release> \
-mode minimize_pdb \
-pdb <input_pdb> \
-score:rna_chemical_shift_exp_data <exp_cs_data_file> \
-params_file <input_param_file> \
-analytic_etable_evaluation false
Score (but not refine) a PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
<path-to-rosetta-bin>/cs_rosetta_rna.<release> \
-mode score_pdb \
-pdb <input_pdb> \
-score:rna_chemical_shift_exp_data <exp_cs_data_file> \
-params_file <input_param_file> \
-analytic_etable_evaluation false
Refine (minimize) and score the tandem GA:AG_mismatch NMR PDB structure under the CS-ROSETTA-RNA all-atom energy function:
~/Rosetta/rosetta_git/Rosetta/main/source/bin/cs_rosetta_rna.graphics.macosgccrelease \
-mode minimize_pdb \
-pdb rosetta_inputs/GA-AG_mismatch/1MIS_NMR.pdb \
-score:rna_chemical_shift_exp_data rosetta_inputs/GA-AG_mismatch/1MIS_exp_1H_chem_shifts.str \
-params_file rosetta_inputs/GA-AG_mismatch/1MIS_params \
-analytic_etable_evaluation false
Score (but not refine) the tandem GA:AG_mismatch NMR PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
~/Rosetta/rosetta_git/Rosetta/main/source/bin/cs_rosetta_rna.graphics.macosgccrelease \
-mode score_pdb \
-pdb rosetta_inputs/GA-AG_mismatch/1MIS_NMR.pdb \
-score:rna_chemical_shift_exp_data rosetta_inputs/GA-AG_mismatch/1MIS_exp_1H_chem_shifts.str \
-params_file rosetta_inputs/GA-AG_mismatch/1MIS_params \
-analytic_etable_evaluation false
Refine (minimize) and score the UAAC tetraloop NMR PDB structure under the CS-ROSETTA-RNA all-atom energy function:
~/Rosetta/rosetta_git/Rosetta/main/source/bin/cs_rosetta_rna.graphics.macosgccrelease \
-mode score_pdb \
-pdb rosetta_inputs/UAAC_loop/4A4R_NMR.pdb \
-score:rna_chemical_shift_exp_data rosetta_inputs/UAAC_loop/4A4R_exp_1H_chem_shifts.str \
-params_file rosetta_inputs/UAAC_loop/4A4R_params \
-analytic_etable_evaluation false
Score (but not refine) the UAAC tetraloop NMR PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:
~/Rosetta/rosetta_git/Rosetta/main/source/bin/cs_rosetta_rna.graphics.macosgccrelease \
-mode minimize_pdb \
-pdb rosetta_inputs/UAAC_loop/4A4R_NMR.pdb \
-score:rna_chemical_shift_exp_data rosetta_inputs/UAAC_loop/4A4R_exp_1H_chem_shifts.str \
-params_file rosetta_inputs/UAAC_loop/4A4R_params \
-analytic_etable_evaluation false
-score::rna_chemical_shift_H5_prime_mode MODE
Specify how to handle assignment of the diastereotopic H5' and H5'' proton pair. Valid modes:
LEAST_SQUARE_IGNORE_DUPLICATES (default)
In this mode, the assignments of H5' and H5'' protons will be based on which values give better agreement between the experimental and back-calculated chemical shifts. Uses this mode, if the experimental non-exchangeable 1H chemical shift are not unambiguously assigned.
UNIQUE
In this mode, the assignments H5' and H5'' proton will be used "as is" and the cs_rosetta_rna will not attempt to switch the H5' and H5'' assignments. Use this mode only if the the experimental non-exchangeable 1H chemical shift data have unambiguous assignments of the diastereotopic 1H5´ and 2H5´ protons. Note that this is uncommon.
A breakdown of the hybrid CS-ROSETTA-RNA all-atom energy terms, e.g:
------------------------------------------------------------
Scores Weight Raw Score Wghtd.Score
------------------------------------------------------------
fa_atr 0.230 -125.447 -28.853
fa_rep 0.120 8.314 0.998
fa_intra_rep 0.003 81.488 0.236
fa_intra_RNA_base_phos_atr 0.230 0.000 0.000
fa_intra_RNA_base_phos_rep 0.120 0.000 0.000
lk_nonpolar 0.320 2.123 0.679
lk_nonpolar_intra_RNA 0.320 3.768 1.206
fa_elec_rna_phos_phos 1.050 -0.074 -0.078
ch_bond 0.420 -30.523 -12.820
rna_torsion 2.900 2.721 7.892
rna_sugar_close 0.700 3.171 2.220
fa_stack 0.125 -199.844 -24.981
geom_sol_fast 0.620 56.483 35.020
geom_sol_fast_intra_RNA 0.620 1.978 1.226
hbond_sr_bb_sc 0.620 0.000 0.000
hbond_lr_bb_sc 2.400 0.000 0.000
hbond_sc 2.400 -20.116 -48.279
hbond_intra 2.400 0.000 0.000
atom_pair_constraint 1.000 0.000 0.000
angle_constraint 1.000 0.000 0.000
rna_bulge 0.450 0.000 0.000
rna_chem_shift 4.000 1.232 4.928
linear_chainbreak 5.000 0.009 0.047
-----------------------------------------------------------
Total weighted score: -60.558
The total hybrid CS-ROSETTA-RNA all-atom energy, e.g:
hybrid_CS-ROSETTA-RNA_all-atom energy: -60.5579
The chemical shift RMSD, e.g:
chem_shift_RMSD: 0.143299
The chem_shift_RMSD (in ppm unit) is the root-mean-deviation between the 'back-calculated' and the experimental 1H chemical shift. A low chem_shit_RMSD indicates that the RNA 3D structure agrees well with the experimental 1H chemical shift data
The RNA PDB structure after refinement under the hybrid CS-ROSETTA-RNA all-atom energy function (if -mode minimize_pdb).
The refined PDB is outputted to the run directory under the filename: <in_pdb_basename>_out
.
Figure 1: Breakdown of the secondary structure of the tandem GA:AG mismatch internal loop:
1 6 1
5'-CGGACG-3' 5'-CG
Entire structure: ||**|| H1 helix: ||
3'-GCAGGC-5' 3'-GC
12 7 12
6
GA CG-3'
2x2 mismatch: ** H2 helix: ||
AG GC-5'
7
How many canonical base-pairs should be included at each helical boundary?
2 base-pairs should be included at each helical boundary (for rationale, see ref. [2]).
For example, in the case of the tandem GA:AG mismatch internal loop, the structure consists of the a 2 base-pairs H1 helix, a 2x2 mismatch, and a 2 base-pairs H2 helix.
Which atoms' chemical shift data should be included?
The chemical shift data of all non-exchangeable proton should be included in the chemical shift data file.
The non-exchangeable protons consist of the H1', H2', H3', H4', H5' and H5'' ribose protons, and the H2, H5, H6 and H8 base protons.
Data lines belonging to other atom types will be ignored.
Which nucleotides' chemical shift data should be included?
The chemical shift data of all nucleotides EXCEPT those that are right at 5' and 3' edges should be included in the chemical shift data file.
For example, in the case of the tandem GA:AG mismatch internal loop, the chemical shift data of all nucleotides except C1, G6, C7 and G12 should be included.
How to prepare the parameters file.
Add a "OBLIGATE PAIR" line for each helical base-pair located right at the 5' and 3' edges of the structure.
In the case of the tandem GA:AG mismatch internal loop, the OBLIGATE PAIRS are "C1-G12" and "G6-C7":
OBLIGATE PAIR 1 12 H H A
OBLIGATE PAIR 6 7 H H A
Note that the cs_rosetta_rna app will refine (minimize) ALL nucleotides EXCEPT nucleotides that are specified as "OBLIGATE PAIR", which will be be kept static.
Add "ALLOW_INSERT" lines to include all non-canonical loop nucleotides position:
In the case of the tandem GA:AG mismatch internal loop, the "ALLOW_INSERT" nucleotide positions are G3, A4, G9 and A10:
ALLOW_INSERT 3 4
ALLOW_INSERT 9 10
Add "CUTPOINT_CLOSED" line to include the position intermediately 5' of the first non-canonical loop nucleotide position.
In the case of the tandem GA:AG mismatch internal loop, the first non-canonical loop nucleotide position is G3. The "CUTPOINT_CLOSED" position is the position intermediately 5' of G3, which is G2:
CUTPOINT_CLOSED 2
Note that if the "CUTPOINT_CLOSED" line was not included in the parameter line, the cs_rosetta_rna app will still be able run by selecting a random loop position as the cutpoint_closed position. However, it is recommended that the "CUTPOINT_CLOSED" line be explictly included to prevents this random selection.
Add "CUTPOINT_OPEN" line for all position intermediately 5' of chain-breaks.
In the case of the tandem GA:AG mismatch internal loop, there is chain-break between G6 and C7. The "CUTPOINT_OPEN" is the position intermediately 5' of the chain-break which is G6:
CUTPOINT_OPEN 6
Adding all the above parameter lines together, we get the parameter file for the tandem GA:AG mismatch ("rosetta_inputs/GA-AG_mismatch/1MIS_params"):
OBLIGATE PAIR 1 12 H H A
OBLIGATE PAIR 6 7 H H A
ALLOW_INSERT 3 4
ALLOW_INSERT 9 10
CUTPOINT_CLOSED 2
CUTPOINT_OPEN 6
Finally, the cs_rosetta_rna app can also run WITHOUT an input parameter file, although this is not recommended. For this case, a simple fold-tree with not chain-break/cutpoints will be used and all nucleotides will be refined (minimized).
How to specify the chemical shift data for the diastereotopic H5' and H5'' proton pairs.
If two chemical shift data points are measured for the diastereotopic H5' and H5'' protons pair and unambiguous assignment is possible, then include correct the unambiguous assignment in the data lines, e.g.:
1 1 1 G H5' H 4.180 . .
2 1 1 G H5'' H 4.540 . .
In this case, please explicitly include the command line option: -score:rna_chemical_shift_H5_prime_mode UNIQUE
If two chemical shift data points are measured for the diastereotopic H5' and H5'' protons pair BUT unambiguously assignment is not possible, then include either of the two possible assignments in the data lines, e.g.:
1 1 1 G H5' H 4.180 . .
2 1 1 G H5'' H 4.540 . .
OR
1 1 1 G H5' H 4.540 . .
2 1 1 G H5'' H 4.180 . .
The cs_rosetta_rna app with automatically select the assignments which leads to better agreement between the experimental and back-calculated chemical shift.
If only one chemical shift data point is measured for the diastereotopic H5' and H5'' proton pair AND unambiguous assignment is not possible, then please still include two chemical shift data lines (with same cs-value), one for each proton, e.g.:
1 1 1 G H5' H 4.180 . .
2 1 1 G H5'' H 4.180 . .
[1] Das, R., Karanicolas, J. & Baker, D. Nat Methods 7, 291-294 (2010).
[2] Sripakdeeving, P. et al. Nature Methods 11, 413–416 (2014).
[3] Sripakdeevong, P., Kladwang, W. & Das, R. Proc Natl Acad Sci U S A 108, 20573-20578 (2011).