Author: Fang-Chieh Chou
Mar. 2012 by Fang-Chieh Chou (fcchou [at] stanford.edu).
The full ERRASER pipeline is controlled by a set of python codes in src/apps/public/ERRASER/
. The main applications being used are erraser_minimizer , swa_rna_analytical_closure and swa_rna_main . The central codes for SWA (StepWise Assembly) applications are in src/protocols/stepwise/legacy/rna/
. The electron density scoring function used in ERRASER is in src/core/scoring/electron_density_atomwise/
.
For a minimal demonstration of ERRASER, see: demos/public/ERRASER/
Chou, F.C., Sripakdeevong, P., Dibrov, S.M., Hermann, T., and Das, R. Correcting pervasive errors in RNA crystallography with Rosetta, arXiv:1110.0276. [For ERRASER. To be published] Preprint
Sripakdeevong, P., Kladwang, W., and Das, R. (2011) "An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling", PNAS 108:20573-20578. [For stepwise assembly algorithm (SWA)] Paper Link
This code is used for improving a given RNA crystallographic model and reduce the number of potential errors in the model (which can evaluated by Molprobity), under the constraint of experimental electron density map.
This method pipelines Rosetta full-atom mimization and stepwise assembly rebuilding for single residue to improve a given RNA crystallographic model. Electron density score is used to constrain the model during the modeling.
ERRASER works only for RNA currently. Other parts in crystallographic model, including proteins, modified bases and ligands, are not being modeled. Remodeling of RNA residues that are in close contact with these components may be problematic. We are planning to tackle these issues in the future, but for now ERRASER seems to be work well for most RNA residues. Residues in close contact with non-RNA components can also be held fixed in ERRASER to avoid problematic rebuilding.
Currently crystal contacts are not being modeled, which is known to cause problems in a few test cases when RNA is interacting strongly with its crystal-packing partner (ex. base-pairing and base-stacking). Right now this problem can be resolved by mannually add the crystal-packing partner into the starting pdb file. We are planning to model crystal-packing in the future.
The PHENIX refinement package is required for the ERRASER pipeline. The users can download PHENIX from http://www.phenix-online.org (free for academic usage)
There is only one mode to run ERRASER at present.
You need two files:
The starting structure in standard pdb format. The ERRASER directly takes the standard pdb file and convert it to Rosetta format automatically, therefore no pre-processing is required.
A CCP4 electron density map file. This can be created by PHENIX or other refinement packages. The input map must be a CCP4 2mFo-DFc map. To avoid overfitting, Rfree reflection should be excluded during the creation of the map file.
Prior to running ERRASER, the following setup is required:
Ensure you have correctly setup PHENIX. As a check, run the following command and see if it works:
phenix.rna_validate
Check if you have the latest python (v2.7) installed. If not, go to the rosetta/rosetta_tools/ERRASER/
folder and run
./convert_to_phenix.python
This will change the default python used by the code to phenix-built-in python, instead of using system python.
Set up the environmental variable "\$ROSETTA", point it to the Rosetta folder. If you use bash, append the following lines to ~/.bashrc
:
ROSETTA=<YOUR_ROSETTA_PATH>; export ROSETTA
Also add the ERRASER script folder to \$PATH. Here is a bash example:
PATH=$PATH:<YOUR_ROSETTA_PATH>/rosetta_tools/ERRASER/
Now you are ready to go!
ERRASER can be simply run with the python script erraser.py
in the rosetta_tools/ERRASER/
directory. If you followed the setup instruction above, you should now be able to run ERRASER directly from command line:
erraser.py -pdb 1U8D_cut.pdb -map 1U8D_cell.ccp4 -map_reso 1.95 -fixed_res A33-37 A61 A65
The first two arguments are required – the input pdb file and the CCP4 map file. The last two arguments are optional; they supply the map resolution and the residues need to be fixed during rebuilding.
You can see examples of the output pdb file in example_output/
.
The above workflow should work, but its worth looking at the rosetta command-lines called by the python scripts to see what's going on.
The minimization step:
erraser_minimizer.<exe> -database <path to database> -native <input pdb> -out_pdb <output pdb>
-score::weights rna/rna_hires_elec_dens -score:rna_torsion_potential RNA09_based_2012_new
-vary_geometry true -fixed_res <fixed residue list>
-edensity:mapfile <map file> -edensity:mapreso 2.0 -edensity:realign no
The rebuilding step with loop closure:
swa_rna_analytical_closure.<exe> -database <path to database> -algorithm rna_resample_test -s <input pdb> -native <native pdb>
-out:file:silent blah.out -sampler_extra_syn_chi_rotamer true -sampler_cluster_rmsd 0.3 -native_edensity_score_cutoff 0.9
-sampler_native_rmsd_screen true -sampler_native_screen_rmsd_cutoff 2.0 -sampler_num_pose_kept 30 -PBP_clustering_at_chain_closure true
-allow_chain_boundary_jump_partner_right_at_fixed_BP true -add_virt_root true -sample_res 2 -cutpoint_closed 2
-fasta fasta -input_res 1 3-4 -fixed_res 1 3-4 -jump_point_pairs NOT_ASSERT_IN_FIXED_RES 1-4 -alignment_res 1-4 -rmsd_res 4
-score:weights rna/rna_hires_elec_dens -edensity:mapfile <map file> -edensity:mapreso 2.0 -edensity:realign no
-score:rna_torsion_potential RNA09_based_2012_new
The rebuilding step at terminal residue:
swa_rna_main.<exe> -database <path to database> -algorithm rna_resample_test -s <input pdb> -native <native pdb>
-out:file:silent blah.out -sampler_extra_syn_chi_rotamer true -sampler_cluster_rmsd 0.3 -native_edensity_score_cutoff 0.9
-sampler_native_rmsd_screen true -sampler_native_screen_rmsd_cutoff 2.0 -sampler_num_pose_kept 30 -PBP_clustering_at_chain_closure true
-allow_chain_boundary_jump_partner_right_at_fixed_BP true -add_virt_root true -sample_res 2 -cutpoint_closed 2
-fasta fasta -input_res 1-4 -fixed_res 2-4 -jump_point_pairs NOT_ASSERT_IN_FIXED_RES 1-4 -alignment_res 1-4 -rmsd_res 4
-score:weights rna/rna_hires_elec_dens -edensity:mapfile <map file> -edensity:mapreso 2.0 -edensity:realign no
-score:rna_torsion_potential RNA09_based_2012_new
Below are a list of available arguments for erraser.py
.
Required:
-pdb
Format: -pdb <input pdb>
The starting structure in standard pdb format.
-map
Format: -map <map file>
2mFo-DFc map file in CCP4 format. Rfree should be excluded.
Commonly used:
-map_reso
Format: -map_reso <float> / Default: 2.0
The resolution of the input density map. It is highly recommanded to input the map
resolution whenever possible for better result.
-out_pdb
Format: -out_pdb <string> / Default: <input pdb name>_erraser.pdb.
The user can output to other name using this option.
-n_iterate
Format: -n_iterate <int> / Default: 1
The number of rebuild-minimization iteration in ERRASER. The user can increase the
number to achieve best performance. Usually 2-3 rounds will be enough. Alternatively,
the user can also take a ERRASER-refined model as the input for a next ERRASER run to
achieve mannual iteration.
-fixed_res
Format: -fixed_res <list> / Default: <empty>
(Example: A1 A14-19 B9 B10-13 #chain ID followed by residue numbers)
This allows users ton fix selected RNA residues during ERRASER. For example, because
protein and ligands are not modeled in ERRASER, we recommand to fix RNA residues
that interacts strongly with these unmodeled atoms. ERRASER will automatically
detect residues covalently bonded to removed atoms and hold them fixed during the
rebuild, but users need to specify residues having non-covalent interaction with
removed atoms mannually.
-kept_temp_folder
Format: -kept_temp_folder <True/False> / Default: False
Enable this option allows user to examine intermediate output files storing in the
temp folder. The default is to remove the temp folder after job completion.
Other:
-rebuild_extra_res
Format/Default: Same as -fixed_res
This allows users to specify extra residues and force ERRASER to rebuild them.
ERRASER will automatically pick out incorrect residues, but the user may be able
to find some particular residues that was not fixed after one ERRASER run. The user
can then re-run ERRASER with -rebuild_extra_res argument, and force ERRASER to
remodel these residues.
-cutpoint_open
Format/Default: Same as -fixed_res
This allows users to specify cutpoints (where the nucleotide next to it is not
connected to itself) in the starting model. Since ERRASER will detect cutpoints in
the model automatically, the users usually do not need to specify this option.
-use_existing_temp_folder
Format: -use_existing_temp_folder <True/False> / Default: True
When is True, ERRASER will use any previous data stored in the existing temp folder
and skip steps that has been done.Useful when the job stopped abnormally and the
user try to re-run the same job. Disable it for a fresh run without using previously
computed data.
-rebuild_all
Format: -rebuild_all <True/False> / Default: False
When is True, ERRASER will rebuild all the residues instead of just rebuilding
errorenous ones. Residues in "-fixed_res" (see below) are still kept fixed during
rebuilding. It is more time consuming but not necessary leads to better result.
Standard rebuilding with more iteration cycles is usually prefered.
-native_screen_RMSD
Format: -native_screen_RMSD <float> / Default: 2.0
In ERRASER default rebuilding, we only samples conformations that are within 2.0 A to
the starting model (which is the "native" here). The user can modify the RMSD cutoff.
If the value of native_screen_RMSD is larger than 10.0, the RMSD screening will be turned off.
At the end you will get a output pdb file in standard pdb format. The output file is in the standard PDB format and inherits all the ligands, metals and waters from the input pdb file. You can then further refine the output model directly using PHENIX or other refinement packages without any post-processing.
elec_dens_atomwise
is used in ERRASER. ERRASER also uses an updated rna torsional potential based on RNA09 dataset.