Tutorial by Shourya S. Roy Burman ( Edited by Parisa Hosseinzadeh (
Created 21 June 2016


There are multiple ways to control how Rosetta protocols read the nput and produce output. By the end of this tutorial, you should understand:

  • What input formats are supported in Rosetta
  • How to deal with odd residues that often cause Rosetta to crash
  • How best to prepare an input structure
  • How to ask Rosetta to compare it with a known structure
  • How to change the input and output paths in Rosetta
  • How to ensure that you get a consistent running trajectory despite being a Monte Carlo protocol
  • How to visualize the changes to the structure in PyMOL
  • How to overwrite existing output

This chapter will cover many executables, whose function will not be explained in much detail. Please refer the the corresponding chapter in the tutorials for a proper explanation of what that particular program does.

Navigating to the Demos

The demos are available at <path_to_Rosetta_directory>/demos/tutorials/input_and_output. All demo commands listed in this tutorial should be executed when in this directory. All the demos here use the linuxgccrelease binary. You may be required to change it to whatever is appropriate given your operating system and compiler.

Controlling Input

Common Stucture Input Files

You can supply Rosetta with a variety of input files which hold the coordinates of your biomolecular structure.

PDB File

The most common input file format is the PDB format. A detailed description of the PDB format can be found on the WorldWide Protein Data Bank website. Primarily the lines which concern Rosetta start with ATOM, HETATM and TER.

ATOM   1477 3HD2 LEU A  94      10.910  -5.038   7.227  1.00  0.00           H  
HETATM 1479  O   HOH A 107      10.027  -4.206  14.093  1.00  0.00           O 
In the example above, Rosetta recognizes that the ATOM record represents one of the Hδ atoms of Leucine-94 in the A chain with coordinates (10.910, -5.038, 7.227), which has an occupancy of 1 and a temperature factor of 0. Rosetta ignores the atom numbering (1477) in the second column and the element symbol in the last column (H). The TER record indicates a chain break. Similarly the HETATM record represents the oxygen atom of a water molecule associated with the A chain with coordinates (10.027, -4.206, 14.093) an occupancy 1 and a temperature factor of 0. Rosetta ignores the atom numbering and the element symbol in this record too. Rosetta stores the temperature factors, but assumes all non-zero occupancies to be 1.

Rosetta only loads in the first conformation if a residue has multiple conformations.

To pass in a single PDB use the in:file:s option. For example, the following can be used to calculate the energy of a refined PDB 1QYS. The input PDB is present in the input_files folder. ($ROSETTA3=path-to-Rosetta/main/source anytime you see it)

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys.pdb

Running this should produce a file called in your current working directory with the energy scores of 1QYS. To proceed on to the next step, remove by typing rm Else, all energy scores of the structures scored here onwards will be appended to this file.

List of PDBs

Suppose you want to pass multiple input structures to an executable, use the option in:file:l. Say, you want to score two PDBs - 1QYS and 1UBQ. We can pass a list of PDBs called pdblist, which contains one path to PDB per line (space separated PDBs work, but they cannot be comma or semi-colon separated) like this:


To run, execute:

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:l input_files/pdblist

Running this should produce a file called in your current working directory. To proceed on to the next step, remove by typing rm Else, all energy scores of the structures scored here onwards will be appended to this file.

Silent File

A silent file is a Rosetta-specific compact format file which stores information from multiple structures. It is especially useful when running simulations with a large number of output structures, where many filesystems will have problems running batch operations. Silent files can be generated by many Rosetta simulations. An example of a binary silent struct file can be found at <path_to_Rosetta_directory>/demos/tutorials/input_and_output/input_files/1qys_10.o.

The first few lines represent the information about the sequence, energy and relative rotation/translation of the chains. The main body, however, is not human readable.

There is another silent file format called protein silent struct file that is human readable, but Rosetta sometimes is unable to output in this format and hence, it is not discussed in this tutorial. Details about this can be found here.

To give Rosetta a silent file as an input, use the in:file:silent option while running your command like this:

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:silent input_files/1qys_10.o

You should again produce a file called in your current working directory with the energy scores of 1QYS. To proceed on to the next step, remove by typing rm Else, all energy scores of the structures scored here onwards will be appended to this file.

The option in:file:silent can also be used to pass a list of silent files.

Dealing with Odd Residues and Water Molecules

Most Rosetta protocols expect the structure they are working on to have a certain set of properties, eg. all heavy atoms should be present, all residue names should be recognizable etc. Sometimes Rosetta can guess which atoms to add. In this example, we will score the PDB 1QYS taken directly from the Protein Data Bank:

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/from_rcsb/1qys.pdb

In the log, you will see the following lines:

... Reading MSE as MET!
core.pack.pack_missing_sidechains: packing residue number 13 because of missing atom number 6 atom name  CG
The first line indicates that it converts the residue MSE, i.e. selenomethionine to MET, i.e. regular methionine. The second line tells you that Rosetta found that the Cγ atom was missing in residue number 13, and built the sidechain for residue number 13.

Unrecognized Residues

A PDB downloaded directly from the Protein Data Bank may or may not work with Rosetta in general. Here's an example where we try to score the PDB 3TDM. When in the right demo directory, run:

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/from_rcsb/3tdm.pdb

The application will exit with the following error:

ERROR: Unrecognized residue: PO4

This PDB contains a phosphate ion that Rosetta is unable to process without additional options. To score this PDB, we will add an option -ignore_unrecognized_res, which simply ignores the phosphates in the PDB.

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/from_rcsb/3tdm.pdb -ignore_unrecognized_res

Now the PDB will be scored and the score will be displayed in a file

ignore_unrecognized_res option also ignores the water molecules in the structure. This may change the energy scores of your structure.

Zero Occupancy

Occupancy denotes the fraction of cases where a particular conformation is observed. While most atoms will have an occupancy of 1, if a residue was observed in multiple conformations, the occupancy will be lower than 1. An occupancy of 0 indicates that the atom was never observed in the crystal (but is estimated to be present at that location). Rosetta ignores these atom records. If it is a non-backbone heavy atom, it might build the sidechain for you. If it is a backbone heavy atom like N or CA, it will delete the entire residue.

We have modified the occupancies of 1QYS to get the file <path_to_Rosetta_directory>/demos/tutorials/input_and_output/input_files/1qys_zero_occ.pdb. It has zero occupancies for the first few atoms

ATOM      1  N   ASP A   3      -4.524  18.589  17.199  0.00  0.00           N  
ATOM      2  CA  ASP A   3      -3.055  18.336  17.160  0.00  0.00           C
On running,
$>  $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys_zero_occ.pdb

You get the following warnings in your log file:

... PDB reader is ignoring atom  N   in residue 3 A.  Pass flag -ignore_zero_occupancy false to change this behavior PDB reader is ignoring atom  CA  in residue 3 A.  Pass flag -ignore_zero_occupancy false to change this behavior
... [ WARNING ] skipping pdb residue b/c it's missing too many mainchain atoms:    3 A ASP ASP:NtermProteinFull


Also note that the score in is higher from the previous runs. This is because it deletes residue 3 and hence loses the ~-3 REU score of the residue. To proceed on to the next step, remove by typing rm Else, all energy scores of the structures scored here onwards will be appended to this file.

There are several Rosetta applications, which require constant sequence length between things they are comapring, like the docking_protocol, which may even crash without an informative error message if zero occupancy atoms are present.

To fix this, we need to use an option class -ignore_zero_occupancy that is set to true by default. Adding the option -ignore_zero_occupancy false will force Rosetta to read in atoms with occupancy 0 as follows:

$>  $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys_zero_occ.pdb -ignore_zero_occupancy false

The score file, produced by this run should match the one with the first example in this chapter.

Preparing a Structure by Refinement

The recommended way to prepare input structures for most Rosetta protocols is to run the refinement protocol, relax on your structure prior to running the Rosetta application you want. A detailed tutorial on relax can be found here.

While we want to relieve clashes in the input structure and ensure meeting all of Rosetta's specifications, we do not want our backbone to move much. A set of general options have been specified in <path_to_Rosetta_directory>/demos/tutorials/input_and_output/flag_input_relax.

-nstruct 2

-relax:ramp_constraints false


-no_optH false
Setting a higher nstruct, say nstruct 10, will increase the number of refinement runs and may produce better results, but may also consume a lot of time.

We will use this flags file to refine the PDB 1QYS take directly from the Protein Data Bank. This may take a few minutes to run:

$>  $ROSETTA3/bin/relax.default.linuxgccrelease -in:file:s input_files/from_rcsb/1qys.pdb @flag_input_relax

This will produce three files: 1qys_0001.pdb, 1qys_0002.pdb and Use the PDB with the lower total_score in the score file as the input PDB for your protocol.

These options can also be supplemented by ignore_unrecognized_res and ignore_zero_occupany false if required.

Using a flag like ignore_unrecognized_res may remove ligands and waters you want to consider.

Ensure that all residues you want to model are present in the refined PDB.

Setting the Input Search Path

If we have multiple input files, having one path location from where to search for inputs may be helpful. For example, while running the relax protocol on the homodimer PDB 4EQ1, we might want to constrain the protein-protein interface distances and prevent them from moving. (A detailed tutorial on constraints can be found here.) To do this we need a constraint file constrained_atompairs.cst which too is located in the directory input_files. Running with the in:path with the input_directory specified helps when we have multiple such files as shown here:

$>  $ROSETTA3/bin/relax.default.linuxgccrelease -in:path input_files -in:file:s 4eq1.pdb -constraints:cst_fa_file constrained_atompairs.cst -ignore_unrecognized_res @flag_input_relax

This will take 15+ minutes to run and produce the files 4eq1_0001.pdb, 4eq1_0002.pdb and

Changing Input Representation - Centroid or Full Atom

Rosetta uses two structure representations - a finer full atom representation and a coarser centroid representation. A detailed tutorial on the differences and uses of the two can be found here. To ensure that Rosetta understands which representation your input file is in, we use the in:file:centroid or the in:file:fullatom options. Example runs can be found in the tutorial linked above.

Input a Known Structure For Comparison

Often, it is useful to compare how close Rosetta gets to a known, native structure. This is especially useful for benchmarking. It can also be used to check how far Rosetta moved an input PDB. To do this, we use the -in:file:native option.

The native PDB must contain the same number of residues, the same residue ordering and the same chain ordering as the structures that Rosetta will output after the protocol. Rosetta will give an error if the number of residues is different between the two, or calculate incorrect metrics if the residue numbering does not match.

In the following example, we will run an older scoring application, score to check how different the refined 1QYS is from the original PDB QYS.

$>  $ROSETTA3/bin/score.default.linuxgccrelease -in:file:s input_files/1qys.pdb -in:file:native input_files/from_rcsb/1qys.pdb -ignore_waters

This produces a score file that should look like:

SCORE:     score     fa_atr     fa_rep     fa_sol    fa_intra_rep    fa_elec    pro_close    hbond_sr_bb    hbond_lr_bb    hbond_bb_sc    hbond_sc    dslf_fa13       rama      omega     fa_dun    p_aa_pp    yhh_planarity        ref    allatom_rms    gdtmm    gdtmm1_1    gdtmm2_2    gdtmm3_3    gdtmm4_3    gdtmm7_4    irms    maxsub    maxsub2.0    rms description
SCORE:  -167.539   -414.834     48.380    225.004           1.040    -45.212        0.000        -25.491        -26.998         -2.986      -9.394        0.000     -4.905      4.211    109.662    -13.603            0.230    -12.643          1.050    1.000       1.000       1.000       1.000       1.000       1.000   0.000    92.000       92.000  0.135   1qys_0001
In this score file, the column allatom_rms represents the all-atom RMSD to the native and it is 1.050. This is because Rosetta packed the sidechains to relieve clashes and optimize interactions while refining. The rms column, which represents the Cα RMSD to native is much lower at 0.135 showing that Rosetta did not move the backbone much. There are other global distance metrics given as well.

Rosetta rebuilds the sidechains of residues in the native structure too if it finds too many heavy atoms missing, so the metrics might be slightly different for every run.

List of Other Options

A full list of other, specific options is given here.

Controlling Output

Common Structure Output Files

Rosetta primarily uses two formats to output structures - PDB files and Silent Files. Both these files have been described in section above.

PDB File

This is the default output format of Rosetta. For applications that do not output a structure by default, like the scoring application, the option -out:pdb forces Rosetta to output the PDB. This is demonstrated in the scoring tutorial.

Silent File

To change output file format to silent file, we will use the flag out:file:silent <filename>. As silent files are capable of storing multiple structures in one file, there will be only one output structure file whose name we need to specify. We will run the structure preparation example above to produce a binary silent struct file.

$>  $ROSETTA3/bin/relax.default.linuxgccrelease -in:file:s input_files/from_rcsb/1qys.pdb -out:file:silent output_files/1qys.o @flag_input_relax 

This will take a few minutes to run and produce a silent file 1qys.o in the directory output_files.

Extracting PDBs from Silent Files

To visualize and analyze, you may want to extract some of the structures from the silent file format as PDBs. For example, the silent file provided as <path_to_Rosetta_directory>/demos/tutorials/input_and_output/input_files/1qys.o contains 10 structures. Say, we want to extract the top 3 structures as PDBs.

To select the top 3, we will first store the scores in a separate file using

$> grep '^SCORE' input_files/1qys_10.o > output_files/

This should a produce a file which looks similar to <path_to_Rosetta_directory>/demos/tutorials/input_and_output/output_files/expected_output/

SCORE:     score     fa_atr     fa_rep     fa_sol    fa_intra_rep    fa_elec    pro_close    hbond_sr_bb    hbond_lr_bb    hbond_bb_sc    hbond_sc    dslf_fa13    coordinate_constraint       rama      omega     fa_dun    p_aa_pp    yhh_planarity        ref       time description
SCORE:  -145.658   -416.906     48.038    235.048           1.023    -47.764        0.000        -25.252        -27.431         -4.739     -10.754        0.000                   19.154     -4.561      4.169    110.657    -13.935            0.237    -12.643    121.000   1qys_0001
SCORE:  -146.560   -421.948     49.483    239.471           1.031    -49.106        0.000        -25.940        -27.309         -4.230     -13.052        0.000                   20.385     -5.436      4.646    110.952    -13.316            0.453    -12.643    115.000   1qys_0010
The last column description holds the tag which we will use to extract the files. Sorting this file based on score using:
$> sort -k1,1 -k2n output_files/

gives us

SCORE:  -148.368   -417.523     46.620    235.293           1.049    -47.662        0.000        -25.448        -26.996         -3.816     -12.297        0.000                   20.283     -4.757      5.036    107.503    -13.480            0.470    -12.643    111.000   1qys_0007
SCORE:  -148.283   -423.099     47.967    241.560           1.046    -48.530        0.000        -25.397        -26.949         -4.226     -13.621        0.000                   19.796     -4.907      5.043    108.665    -13.450            0.461    -12.643    110.000   1qys_0005
SCORE:  -147.763   -416.130     46.386    235.877           1.023    -47.914        0.000        -25.787        -27.212         -3.864     -12.092        0.000                   20.623     -4.800      4.678    107.706    -13.821            0.205    -12.643    118.000   1qys_0009

We see that 1qys_0007, 1qys_0005 and 1qys_0009 are the 3 lowest scoring structures, which we want to extract. You are free to sort through any metric you desire. To do this we will prepare a list of these tags as follows:

We will feed this tags file in using the option in:file:tagfile into the executable extract_pdbs to generate the desired result.
$> $ROSETTA3/bin/extract_pdbs.default.linuxgccrelease -in:file:silent input_files/1qys_10.o -in:file:tagfile input_files/1qys_top3.tag

Now, you should get 3 PDBs 1qys_0007.pdb, 1qys_0005.pdb and 1qys_0009.pdb in the current working directory.

Compressed Files

To conserve space for runs that produce a large number of structures, Rosetta can automatically gzip output files. Add the option -out:pdb_gz instead of -out:pdb to produce compressed PDBs. Suffix <filename> with .gz in out:file:silent <filename> to produce compressed silent files.

Score File Formats

Rosetta supports two formats for the score file: text (the default) and json. Throughout these tutorials we will use the text format. To switch to the json format, we can use the option -out:file:scorefile_format json

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys.pdb -out:file:scorefile_format json

This will output the score file, in json format like this:


Adding Prefixes and Suffixes to the Output Files

By default, Rosetta uses the file name of the input structure to generate the output structure by appending numerical suffixes like _0001 and _0002 to it. If you would like to add a prefix or a suffix to the output structures, use the -out:prefix <string> or the -out:suffix <string>. In this example, we will add the string pre_ as prefix and the string _suf as suffix to the output. Running

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys.pdb -out:pdb -out:prefix pre_ -out:suffix _suf

Will produce the files pre_1qys_suf_0001.pdb and pre_score_suf.scin your current working directory.

Setting Output Paths

Throughout this tutorial, we have been outputting the score files and the output structures in the current working directory. To change this behaviour we use -out:path:all option. Let us once again run scoring while saving the PDB and the score file in the folder output_files.

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys.pdb -out:pdb -out:path:all output_files

You should find 1qys_0001.pdb and in the directory output_files.

To save the score file and the pdb in separate locations use the -out:path:score and the -out:path:pdb options.

Forcing and Supressing Output of Files

We have already seen how score_jd2 can be forced to output the PDB file it is actually scoring using the flag -out:pdb in the section on outputting PDB files.

If you want to suppress the output files we can use the flag -out:nooutput. This is especially if you just want to look at the log, but do not want to check the output. Another potential use of this flag is in certain protocols which directly write non-structure, non-score files.

If you specifically want to suppress a particular kind of file, say the score file, you can direct it to a UNIX device called /dev/null. In the following example, we force score_jd2 to output a PDB file, but no score file is output.

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys.pdb -out:pdb -out:path:score /dev/null

Your current working directory should only contain 1qys_0001.pdb, but no file.

Only Output a Score File

If you only want to output a score file, you should use the option -out:file:score_only <score_file_name>. In the following example, we will run

$> $ROSETTA3/bin/relax.default.linuxgccrelease -in:file:s input_files/1qys.pdb -out:file:score_only output_files/ @flag_input_relax

This only generates the score file in the directory output_files. No output structures are generated.

Adjusting Detail Level in Logs

When you run Rosetta, the log displays quite a lot of information. Sometimes you might want to know more about the process, for example, when you encounter an error or an unexpected result. At times you may want to reduce the details in the log. Rosetta allows you to adjust the detail levels in the log through the option -out:level <integer>. The following are the list of values <integer> accepts:

Integer Level
0 Fatal
100 Error
200 Warning
300 Info
400 Debug
500 Trace

By default Rosetta uses the level 300. In this tutorial, we will increase the detail level to include information useful for debugging score_jd2.

$> $ROSETTA3/bin/score_jd2.default.linuxgccrelease -in:file:s input_files/1qys.pdb -out:level 400

Here's a snippet of the log file that you should see:

core.chemical: New atom type: aroC C
core.chemical.ElementSet: New element: Pt
core.chemical: Reading patch file: /home/ssrb/Rosetta/main/database/chemical/residue_type_sets/fa_standard/patches/CtermProteinFull.txt
core.pose.util: new fold tree FOLD_TREE  EDGE 1 92 -1  EDGE 1 93 1  EDGE 1 94 2  EDGE 1 95 3  EDGE 1 96 4  EDGE 1 97 5  EDGE 1 98 6  EDGE 1 99 7 

Now you see a bunch of information that previously did not appear. In the snippet above, we see, the residue types and element types that Rosetta recognizes (C associated with aromatic rings and Platinum), all the patches it can apply (patch to make a residue the C-terminal) and the revised fold tree.

Replicating Output in Rosetta Protocols

Most protocols in Rosetta use Monte Carlo sampling. While this stochastic method of sampling speeds up the search for an energy minimum, it produces different trajectories in every run. Rosetta uses a random number seed supplied by the /dev/urandom device of your system to generate the pseudo-random numbers it uses for an application. This seed can be any integer that a C++ int datatype can hold, and the suggested range is ±seed106 - 109. It is displayed in the log at the start of every run as follows:

core.init: 'RNG device' seed mode, using '/dev/urandom', seed=340573764 seed_offset=0 real_seed=340573764
The seed in this snippet is 340573764.

All other things being equal, every run of a Rosetta protocol running on the same system with the same seed should have the same trajectory. If you want to replicate a run later, you should store the seed from your run and use it later.

In this example, we will produce the same output using the relax application suggested above by specifying a constant seed. To do this, we need the flag -run:constant_seed which makes sure that the seed is constant. The default constant seed is 1111111, which we will change to 12345678 using the -run:jran option. Every run of the following command should produce the same set of structures and score files when run from the same system.

$> $ROSETTA3/bin/relax.default.linuxgccrelease -in:file:s input_files/1qys.pdb -run:constant_seed -run:jran 12345678 @flag_input_relax

The log file will indicate that you are running using a constant seed.

core.init: Constant seed mode, seed=12345678 seed_offset=0 real_seed=12345678

Near-realtime Visualization in PyMOL

It is often useful to visualize how Rosetta is modifying the biomolecule as the simulation goes on. You can do so in the molecular visualization package, PyMOL. To attach a PyMOL Observer to your run, you need to first create a link between Rosetta and PyMOL. Open PyMOL and in the command line in PyMOL run:

run <path_to_Rosetta_directory>/main/source/src/python/bindings/

You will see the log file in PyMOL display:

PyMOL <---> PyRosetta link started!
To show the biomolecule, we will pass run:-show_simulation_in_pymol <time_in_seconds>. The Observer, by default, captures the state every 5 seconds, which we can change depending on the requirement. To keep a history of the states visited during the run, we will pass the option -keep_pymol_simulation_history. This can be especially useful for making movies of the run. This does slightly slow down the run.

In this example, we will capture the relax run of the native PDB 1QYS using PyMOL, and record the history of the relax by capturing a snapshot every 4.5 seconds.

Assuming you have established the link between PyMOL and Rosetta, run the following command, and observe PyMOL:

$> $ROSETTA3/bin/relax.default.linuxgccrelease -in:file:s input_files/from_rcsb/1qys.pdb -show_simulation_in_pymol 4.5 -keep_pymol_simulation_history @flag_input_relax

You will observe different parts of the structure undergoing small motions. This is what relax (with constraints) does to the structure till it arrives at a satisfactory structure.

The states displayed in PyMOL represent the states tried by the protocol every n seconds. They may or may not have been accepted during the simulation.

There are other, specific option like changing the state in PyMOL only if the energy score of the structure has changed, or changing the state only if the conformation has changed. For those, you can pass the flags, -update_pymol_on_energy_changes_only and -update_pymol_on_conformation_changes_only, respectively.

Overwriting Previously Generated Output

If you have structure files in the same directory named similarly to the output structure files that your simulation will generate, you will see the error:

protocols.jd2.JobDistributor: no jobs were attempted, did you forget to pass -overwrite?
This happens most often when you run the same protocol again without taking care of the output files produced during the first run. If you want to overwrite the files, pass the option -overwrite, else save the output files in a separate location before running again.

Now say you completed the relax simulation in the previous section and then rerun the same command. You will see the error above. Now we will run essentially the same protocol, but with the -overwrite option:

$> $ROSETTA3/bin/relax.default.linuxgccrelease -in:file:s input_files/from_rcsb/1qys.pdb -overwrite @flag_input_relax

You will see the you still have two output structure files, 1qys_0001.pdb and 1qys_0002.pdb, but they will be more recently written, which you can check using ls -l 1qys_000*.pdb.

The -overwrite option does overwrite the score file. Entries will still be appended to the exsiting score file. Score files must be manually moved or deleted before every run.

List of Other Options

A full list of other, specific options is given here.