A collection of tools for PDB editing, cluster submission, 'silent-file' processing, and setting up rna_denovo and ERRASER jobs.
Under active development by the Das laboratory (Stanford).
This documentation describes how to set up code in:
tools/rna_tools/bin
Note: these setup instructions are for bash. If you're using a different shell, you will need to modify the commands accordingly.
Step 1: Add the following lines to your .bashrc
(may be .bash_profile
on some systems), replacing YOUR_PATH_TO_ROSETTA
with your actual path to Rosetta (for example: /home/src/rosetta/
):
export ROSETTA='YOUR_PATH_TO_ROSETTA'
export RNA_TOOLS=$ROSETTA/tools/rna_tools/
export PATH=$RNA_TOOLS/bin/:$PATH
export PYTHONPATH=$PYTHONPATH:$RNA_TOOLS/bin/
Step 2: Type source ~/.bashrc
(or if you edited .bash_profile
, then instead type source ~/.bash_profile
) to activate these paths & tools.
Step 3: Type python $RNA_TOOLS/sym_link.py
.
Step 4: Verify the setup by typing rna_helix.py -h
. This should print out usage instructions for rna_helix.py
.
Following are example command lines for several of these Python-based tools:
To change the residue numbers and chains in a file:
renumber_pdb_in_place.py mymodel.pdb A:1-4 B:5-9
or
renumber_pdb_in_place.py mymodel.pdb 1-4 5-9
To generate a fasta file with the sequences from a PDB:
pdb2fasta.py mymodel.pdb [ > mymodel.fasta ]
To pull out a particular chain from a PDB file:
extract_chain.py mymodel.pdb A
To slice out particular residues from a PDB file:
pdbslice.py mymodel.pdb -subset 1-5 9-12 mysubset_
The last argument is a prefix for the sliced PDB file.
To excise particular residues from a PDB file:
pdbslice.py mymodel.pdb -excise 6-8 excised_
Again, the last argument is a prefix for the sliced PDB file.
'Silent files' use Rosetta's compressed file format that concatenates the scores for each model as well as the model coordinates (sometimes in a UU-encoded compressed format that looks like gobbledygook).
To quickly get the number that goes with each score column (useful before making scatterplots in gnuplot
),
fields.py mysilentfile.out
To find the lowest energy 20 models within the silent file and run Rosetta's extract_pdbs
executable to extract them:
extract_lowscore_decoys.py mysilentfile.out 20
To create a new silent file with just the lowest energy 20 models:
extract_lowscore_decoys_outfile.py mysilentfile.out -out 20
To concatenate several outfiles, renaming model tags to be unique:
cat_outfiles.py mysilentfile1.out mysilentfile2.out [ ... ]
To generate a near-ideal A-form RNA helix that has good Rosetta energy (requires that rna_helix
Rosetta executable is compiled):
rna_helix.py -seq aacg cguu -o myhelix.pdb [ -resnum A:5-8 A:20-23 -extension static.linuxgccrelease ]
The extension
is the extension of your rna_helix
executable. To find this, type ls $ROSETTA/main/source/bin/rna_helix*
(requires that the ROSETTA
environmental variable is set, see Setup). This will print something like this to your screen:
/YOUR_PATH_TO_ROSETTA/main/source/bin/rna_helix.static.linuxgccrelease
The extension is everything that comes after rna_helix.
, so here the extension would be static.linuxgccrelease
. If the executable is simply named /YOUR_PATH_TO_ROSETTA/main/source/bin/rna_helix
, then you do not need to provide the extension flag to rna_helix.py
.To strip out residues and HETATMs that are not recognizable as RNA from a PDB file:
make_rna_rosetta_ready.py rawmodel.pdb
Legacy [Following functionalities are directly available in rna_denovo
application now]: To prepare files for an RNA denovo (fragment assembly of RNA with full atom refinement, FARFAR) job:
rna_denovo_setup.py -fasta mysequence.fasta -secstruct_file mysecstruct.txt
See also: rna_denovo_setup.These are largely geared towards clusters that the Das lab uses (stampede and other XSEDE resources; the BioX3 computer at Stanford), but are meant to be easily generalized to new ones.
To create submission files for a set of Rosetta command lines in a README
file:
rosetta_submit.py README OUTDIR 40 [number of hours]
The directory OUTDIR
will contain directories 0/
, 1/
, to 39/
for 40 jobs. Your command-line should have a flag like -out:file:silent mymodels.out
, which will be elaborated into flags like -out:file:silent OUTDIR/0/mymodels.out
in the 40 jobs, so that theire outfiles will go into separate subdirectories and prevent file i/o conflicts.
Submission scripts for Condor, LSF, PBS, and SLURM (stampede) will show up in the directory along with (hopefully) a suggestion for how to run them on your cluster. Default number of hours is 16, but can be changed above. If you set up on a new cluster, please update rosetta_submit.py so that others can take advantage of your work.
While running or after running, bring together models from the different outfiles into a single silent file by:
easy_cat.py OUTDIR