Application Purpose

The cs_rosetta_rna application refines (minimizes) and scores a RNA structure under the hybrid CS-ROSETTA-RNA all-atom energy function:

E(hybrid) = E(Rosetta) + E(shift)

, where E(Rosetta) is the standard Rosetta all-energy function for RNA [1], and E(shift) is the chemical shift pseudo-energy term [2].

Input RNA PDB structures can be generated by FARFAR [1] and/or Stepwise Assembly [3] structure modeling methods, or can be an experimental NMR or crystallographic structure.

CS-ROSETTA-RNA can now also utilize ribose (C1', C2', C3', C4', C5') and base (C2, C5, C6, C5) carbon and exchangeable imino (N1, N3, H1, H3) chemical shifts. When present these, predictions are carried out using LARMORD [4].

Code and Demo

The code for the cs_rosetta_rna application is in src/apps/public/farna/cs_rosetta_rna.cc.

For 'minimal' demo examples, including input files, of the cs_rosetta_rna protocol, see: rosetta/demos/public/cs_rosetta_rna.

Example Infiles

1) rosetta_inputs/GA-AG_mismatch/1MIS_NMR.pdb

NMR PDB structure of a tandem GA:AG mismatch internal loop.

2) rosetta_inputs/GA-AG_mismatch/1MIS_exp_1H_chem_shifts.str

Experimental non-exchangeable 1H chemical shift data for the tandem GA:AG mismatch internal loop.

Each line in this file represent one chemical shift data point and contains the following nine space-delimited columns (based on the STARv2.1 format):

col 1: Atom_shift_assign_ID (INT)
col 2: Residue_author_seq_code (INT)
col 3: Residue_seq_code (INT)
col 4: Residue_label (STRING)
col 5: Atom_name (STRING)
col 6: Atom_type (STRING)
col 7: Chem_shift_value (FLOAT)
col 8: Chem_shift_value_error (STRING)
col 9: Chem_shift_ambiguity_code (STRING)

NOTE: the residue_seq_code (col 3), residue_label (col 4), and atom_name (col 5) should be consistent with the data in the PDB file. Also, col 8 and col 9 are currently not used internally by the cs_rosetta_rna application.

3) rosetta_inputs/GA-AG_mismatch/1MIS_params

Parameters file (in FARNA format) for the tandem GA:AG mismatch.

Also, the input data files for all 23 RNA motifs benchmarked in ref. [2] are provided in the Supplemental Data Zip file, available here.

Usage

1) Refine (minimize) and score a PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:

cs_rosetta_rna -mode minimize_pdb -pdb <input_pdb> -score:rna_chemical_shift_exp_data <exp_cs_data_file> -params_file <input_param_file> -analytic_etable_evaluation false

2) Score (but not refine) a PDB structure under the hybrid CS-ROSETTA-RNA all-atom energy function:

cs_rosetta_rna -mode score_pdb -pdb <input_pdb> -score:rna_chemical_shift_exp_data <exp_cs_data_file> -params_file <input_param_file> -analytic_etable_evaluation false

For example run command-lines, see "run_cmds.txt".

Options

-score::rna_chemical_shift_H5_prime_mode

Specify how to handle assignment of the diastereotopic H5' and H5'' proton pair.
Valid modes:

LEAST_SQUARE_IGNORE_DUPLICATES (default)
- In this mode, the assignments of H5' and H5'' protons will be based on which values give better agreement between the experimental and back-calculated chemical shifts.
- Use this mode, if the experimental non-exchangeable 1H chemical shift are not unambiguously assigned.
UNIQUE
- In this mode, the assignments H5' and H5'' proton will be used "as is" and the cs_rosetta_rna will not attempt to switch the H5' and H5'' assignments.
- Use this mode only if the the experimental non-exchangeable 1H chemical shift data have unambiguous assignments of the diastereotopic 1H5´ and 2H5´ protons. Note that this is uncommon.

-score::rna_chemical_shift_verbose

Print out comparison between predicted and measured chemical shifts to standard out.

-score:rna_chemical_shift_larmord

Force all chemical shifts to be predicted using LARMORD, including proton chemical shifts.
NOTE: In this mode, user must also specify: -score:rna_chemical_shift_H5_prime_mode UNIQUE

-score:rna_chemical_shift_larmord_wt

File containing weights that determine the contribution of each nucleus type to the error type. Typical all the weight for a given nucleus = 1/expected_error
Format for weight file:
```
col 1: nucleus_type (e.g. C1') (STRING)
col 2: weight (FLOAT)
```
larmord_noweight.dat
- default
- all nuclei equally weighted
larmord_nuchemics_1.0_nocut_accuracy.dat
- each nucleus is differential weighted (1/expected_error)
- use when predicting non-exchangeable proton shifts with NUCHEMICS and other nuclei with LARMORD
larmord_1.0_nocut_accuracy.dat
- each nucleus is differential weighted (1/expected_error)
- use when predicting all chemical shifts with LARMORD

-score:rna_chemical_shift_larmord_par`

File containing the LARMORD parameters

larmord_1.0_nocut_parameters.dat
- default
- correspond to that published in [4]

Outputs

Output of cs_rosetta_rna includes:

1) A breakdown of the hybrid CS-ROSETTA-RNA all-atom energy terms, e.g:

------------------------------------------------------------
Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
fa_atr                       0.230    -125.447     -28.853
fa_rep                       0.120       8.314       0.998
fa_intra_rep                 0.003      81.488       0.236
fa_intra_RNA_base_phos_atr   0.230       0.000       0.000
fa_intra_RNA_base_phos_rep   0.120       0.000       0.000
lk_nonpolar                  0.320       2.123       0.679
lk_nonpolar_intra_RNA        0.320       3.768       1.206
fa_elec_rna_phos_phos        1.050      -0.074      -0.078
ch_bond                      0.420     -30.523     -12.820
rna_torsion                  2.900       2.721       7.892
rna_sugar_close              0.700       3.171       2.220
fa_stack                     0.125    -199.844     -24.981
geom_sol_fast                0.620      56.483      35.020
geom_sol_fast_intra_RNA      0.620       1.978       1.226
hbond_sr_bb_sc               0.620       0.000       0.000
hbond_lr_bb_sc               2.400       0.000       0.000
hbond_sc                     2.400     -20.116     -48.279
hbond_intra                  2.400       0.000       0.000
atom_pair_constraint         1.000       0.000       0.000
angle_constraint             1.000       0.000       0.000
rna_bulge                    0.450       0.000       0.000
rna_chem_shift               4.000       1.232       4.928
linear_chainbreak            5.000       0.009       0.047
-----------------------------------------------------------
Total weighted score:                              -60.558

2) The total hybrid CS-ROSETTA-RNA all-atom energy, e.g:

hybrid_CS-ROSETTA-RNA_all-atom energy: -60.5579

3) The chemical shift RMSD, e.g:

chem_shift_RMSD: 0.143299

The chem_shift_RMSD (in ppm unit) is the root-mean-deviation between the 'back-calculated' and the experimental 1H chemical shift. A low chem_shit_RMSD indicates that the RNA 3D structure agrees well with the experimental 1H chemical shift data

4) The RNA PDB structure after refinement under the hybrid CS-ROSETTA-RNA all-atom energy function (if -mode minimize_pdb).

The refined PDB is outputted to the run directory under the filename: <in_pdb_basename>_out.

Best Practices

Figure 1: Breakdown of the secondary structure of the tandem GA:AG mismatch internal loop:

                         1    6                              1
                      5'-CGGACG-3'                        5'-CG
  Entire structure:      ||**||               H1 helix:      ||     
                      3'-GCAGGC-5'                        3'-GC 
                         12   7                              12
                       
                                                                  6
                           GA                                    CG-3'
  2x2 mismatch:            **                 H2 helix:          ||
                           AG                                    GC-5'
                                                                  7

How to activate CS-ROSETTA-RNA in ROSIE.

How many canonical base-pairs should be include at each helical boundary.

2 base-pairs should be included at each helical boundary (for rationale, see ref. [2]). For example, in the case of the tandem GA:AG mismatch internal loop, the structure consists of the a 2 base-pairs H1 helix, a 2x2 mismatch,and a 2 base-pairs H2 helix.

Which atoms' chemical shift data should be included.

The chemical shift data of all non-exchangeable proton should be included in the chemical shift data file. The non-exchangeable protons consist of the H1', H2', H3', H4', H5' and H5'' ribose protons, and the H2, H5, H6 and H8 base protons. Data lines belonging to other atom types will be ignored.

Which nucleotides' chemical shift data should be included.

The chemical shift data of all nucleotides EXCEPT those that are right at 5' and 3' edges should be included in the chemical shift data file. For example, in the case of the tandem GA:AG mismatch internal loop, the chemical shift data of all nucleotides except C1, G6, C7 and G12 should be included.

How to prepare the parameters file.

1) Add a OBLIGATE PAIR line for each helical base-pair located right at the 5' and 3' edges of the structure.

In the case of the tandem GA:AG mismatch internal loop, the OBLIGATE PAIRS are C1-G12 and G6-C7:
```
OBLIGATE   PAIR 1 12 H H A
OBLIGATE   PAIR 6 7 H H A
```
Note that the cs_rosetta_rna app will refine (minimize) ALL nucleotides EXCEPT nucleotides that are specified as OBLIGATE PAIR, which will be be kept static.

2) Add ALLOW_INSERT lines to include all non-canonical loop nucleotides position:

In the case of the tandem GA:AG mismatch internal loop, the ALLOW_INSERT nucleotide positions are G3, A4, G9 and A10:
```
ALLOW_INSERT 3 4
ALLOW_INSERT 9 10
```

3) Add CUTPOINT_CLOSED line to include the position intermediately 5' of the first non-canonical loop nucleotide position.

In the case of the tandem GA:AG mismatch internal loop, the first non-canonical loop nucleotide position is G3. The CUTPOINT_CLOSED position is the position intermediately 5' of G3, which is G2:
```
CUTPOINT_CLOSED 2
```
Note that if the CUTPOINT_CLOSED line was not included in the parameter line, the cs_rosetta_rna app will still be able run by selecting a random loop position as the cutpoint_closed position. However, it is recommended that the CUTPOINT_CLOSED line be explicitly included to prevents this random selection.

4) Add CUTPOINT_OPEN line for all position intermediately 5' of chain-breaks.

In the case of the tandem GA:AG mismatch internal loop, there is chain-break between G6 and C7. The CUTPOINT_OPEN is the position intermediately 5' of the chain-break which is G6:
```
CUTPOINT_OPEN 6
```

Adding all the above parameter lines together, we get the parameter file for the tandem GA:AG mismatch:

See rosetta_inputs/GA-AG_mismatch/1MIS_params:

OBLIGATE   PAIR 1 12 H H A
OBLIGATE   PAIR 6 7 H H A
ALLOW_INSERT 3 4
ALLOW_INSERT 9 10
CUTPOINT_CLOSED 2
CUTPOINT_OPEN 6

Finally, the cs_rosetta_rna app can also run WITHOUT an input parameter file, although this is not recommended. For this case, a simple fold-tree with no chain-break/cutpoints will be used and all nucleotides will be refined (minimized).

How to specify chemical shift data for diastereotopic H5'/H5'' proton pairs.

Case 1:

If two chemical shift data points are measured for the diastereotopic H5' and H5'' protons pair and unambiguous assignment is possible,then include correct the unambiguous assignment in the data lines, e.g.:
```
1  1  1 G H5'  H   4.180 . . 
2  1  1 G H5'' H   4.540 . .  
```
In this case, please explicitly include the command line option: -score:rna_chemical_shift_H5_prime_mode UNIQUE

Case 2:

If two chemical shift data points are measured for the diastereotopic H5' and H5'' protons pair BUT unambiguously assignment is not possible, then include either of the two possible assignments in the data lines, e.g.:
```
1  1  1 G H5'  H   4.180 . . 
2  1  1 G H5'' H   4.540 . .  
```
OR
```
1  1  1 G H5'  H   4.540 . . 
2  1  1 G H5'' H   4.180 . .  
```
The cs_rosetta_rna app with automatically select the assignments which leads to better agreement between the experimental and back-calculated chemical shift.

Case 3:

If only one chemical shift data point is measured for the diastereotopic H5' and H5'' proton pair AND unambiguous assignment is not possible, then please still include two chemical shift data lines (with same cs-value), one for each proton, e.g.:
```
1  1  1 G H5'  H   4.180 . . 
2  1  1 G H5'' H   4.180 . . 
```

References

[1] Das, R., Karanicolas, J. & Baker, D. Nat Methods 7, 291-294 (2010).

[2] Sripakdeeving, P. et al. Nature Methods 11, 413–416 (2014).

[3] Sripakdeevong, P., Kladwang, W. & Das, R. Proc Natl Acad Sci U S A 108, 20573-20578 (2011).

[4] Frank, AT., Law, SM & Brooks III, CL J. Phys. Chem B 118, 12168-12175 (2014).

CS-Rosetta RNA