The scripts and input files that accompany this demo can be found in the
demos/public
directory of the Rosetta weekly releases.
KEYWORDS: NUCLEIC_ACIDS RNA INTERFACES
Written in April 2018 by Kalli Kappel (kappel at stanford dot edu).
This demo shows how to use the Rosetta-Vienna ddG method to calculate relative binding affinites for a RNA-protein complex.
start_structure.pdb
. This is the MS2 coat protein-RNA hairpint complex. mutant_list.txt
. This is a text file specifying the sequences for which we want to calculate relative binding affinities. One sequence should be specified per line. These can either be the full sequence of the complex (RNA and protein), or just the RNA sequence. If the protein sequence is not specified, then no mutations to the protein will be made. Here, mutant_list.txt
contains:ugaggcucaccca
ugaggagcaccca
So, we will be calculating relative binding affinities for two sequences specifying mutations in the RNA.
First, we need to "relax" the starting structure into the Rosetta score function that we'll be using for the ddG calculations.
1.1 Set up the relaxation run. We need to provide the starting struture (this will be our "wildtype" -- mutant binding affinities will be calculated relative to the sequence in this PDB file). Here, we're only relaxing the structure once (--nstructs 1), but for actual runs it is recommended that the relaxation is performed 100 times (--nstructs 100). Type:
python PATH_TO_ROSETTA/main/source/src/apps/public/rnp_ddg/relax_starting_structure.py --start_struct start_structure.pdb --nstructs 1 --rosetta_prefix PATH_TO_ROSETTA/main/source/bin/
Note that you need to replace PATH_TO_ROSETTA with your path to the Rosetta code.
This will create a directory named relax_start_structure
and command files ALL_RELAX_COMMANDS
and RELAX_COMMAND_1
. relax_start_structure
contains sub-directories for each relaxation run. There should be nstructs
numbered sub-directories (here nstructs
= 1, so there is only 1 numbered sub-directory).
1.2 To run the relaxation calculations that we just set up in the previous step, type:
source ALL_RELAX_COMMANDS
This will take a few minutes to run. The final relaxed structure is in relax_start_structure/1/min_again_start_structure_wildtype_bound.pdb
. If we had specified nstructs
> 1, then there would be a file named min_again_start_structure_wildtype_bound.pdb
in each of the numbered directories in relax_start_structure
.
1.3 To get a ranked list of the lowest scoring relaxed structures, type:
python PATH_TO_ROSETTA/main/source/src/apps/public/rnp_ddg/get_lowest_scoring_relaxed_models.py --relax_dir relax_start_structure/
Note: you need to replace PATH_TO_ROSETTA with your path to the Rosetta code.
This will print the following message to the screen:
The lowest scoring 1 models:
Rank 0, Score -999.996: relax_start_structure//1/min_again_start_structure_wildtype_bound.pdb
(Here, this wasn't completely necessary since the relaxation was only performed once.)
2.1 Set up the ddG calculations. This will create a directory and all the necessary files for the ddG calculation runs. Type:
python PATH_TO_ROSETTA/main/source/src/apps/public/rnp_ddg/general_RNP_setup_script.py --low_res --tag demo_run --start_struct relax_start_structure//1/min_again_start_structure_wildtype_bound.pdb --seq_file mutant_list.txt --rosetta_prefix PATH_TO_ROSETTA/main/source/bin/ --Nreps 1 --protein_pack_reps 1
Note: you need to replace PATH_TO_ROSETTA with your path to the Rosetta code.
Let's walk through each of the options:
--low-res
: Use the low-res method of calculating RNA-protein relative binding affinities. This is recommended. Alternatively, you can specify --med-res
, which will use rna_denovo
to build mutant structures when they introduce a non-canonical RNA base pair (i.e. if the WT structure contains a G-C base pair and you specify a mutation to G-G).
--tag
: Here we can specify a string that will be used to create the name of the output directory.
--start_struct
: This is the relaxed starting structure that will be used to calculate energies of mutants. This will be considered the wildtype sequence. This should be one of the lowest-scoring structures that was printed to the screen by get_lowest_scoring_relaxed_models.py
.
--seq_file
: This is a text file specifying the sequences for which we want to calculate relative binding affinities. One sequence should be specified per line. These can either be the full sequence of the complex (RNA and protein), or just the RNA sequence. If the protein sequence is not specified, then no mutations to the protein will be made.
--rosetta_prefix
: The path to the Rosetta executables.
--Nreps
: The number of times the mutation and subsequent relaxation of surrounding residues should be performed. For actual runs, it is recommended that this is set to 10.
--protein_pack_reps
: The number of times the "unbound" protein structure should be repacked, to calculate the energy of the unbound protein. For actual runs, it is recommended that this is set to 10.
This setup command will create a directory named ddG_demo_run_low-res
. In this directory, there is a numbered subdirectory corresponding to each of the mutant sequences that were specified in the mutant_list.txt
file. 0/
corresponds to the wildtype, 1/
corresponds to the first sequence listed in mutant_list.txt
, 2/
corresponds to the second sequence listed in mutant_list.txt
, etc. ddG_demo_run_low/general_setup_settings.txt
lists the options that will be used for the run. ddG_demo_run_low/ALL_COMMANDS
contains the actual command lines to run all of the ddG calculations. ddG_demo_run_low/COMMAND_0
, ddG_demo_run_low/COMMAND_1
, and ddG_demo_run_low/COMMAND_2
contain the commands to run the ddG calculations for the wildtype, first, and second mutations, respectively. All of these commands are contained within ddG_demo_run_low/ALL_COMMANDS
; these individual files are just useful if you're running a lot of mutants on a cluster -- each command can be run simultaneously on a different core.
2.2 Run the ddG calculations. Type:
source ddG_demo_run_low-res/ALL_COMMANDS
This will take several minutes to run.
2.3 Get the ddG results. Type:
python PATH_TO_ROSETTA/main/source/src/apps/public/rnp_ddg/get_final_ddG_scores.py --run_dir ddG_demo_run_low-res/ --seq_file mutant_list.txt
Note: you need to replace PATH_TO_ROSETTA with your path to the Rosetta code.
--run_dir
should specify the directory that the ddG calculations were run in (the one that was created by general_RNP_setup_script.py
). --seq_file
should list the same seq_file
that was provided to general_RNP_setup_script.py
.
This will print the following to the screen:
####################################
RESULTS:
WT ddG: 0.00 kcal/mol
ugaggcucaccca ddG: 2.39 kcal/mol
ugaggagcaccca ddG: 0.62 kcal/mol
####################################
Results are also written to the ddG_score.txt files in
each mutant directory in ddG_demo_run_low-res/
The format is:
ddG dG complex_score protein_score rna_score
####################################
These calculations are not deterministic, so the actual numbers might differ slightly. Normally, these calculations should be performed on the top 20 relaxed structures, then the final ddG values should be averaged over the 20 results.
As noted in the message above, the ddG results are also listed in the ddG_score.txt
files in each of the run directories. The first column in a ddG_score.txt
specifies the calculated ddG value in kcal/mol.
For reference, example output for this demo is provided in the example_output
directory.
See the Rosetta-Vienna ddG documentation here.