In the beginning Rosetta was created from the Centroid and the Fragment.
And the Fullatom Pose was without Conformation, and Null; and darkness was upon the potential energy surface. And the Students of Baker moved upon the face of protein structure.
And Rosetta said, Let there be the Metropolis criterion: and there was convergence.
And Rosetta saw the folding funnel, and saw that it was good. Thus the Students of Baker divided the models from the decoys.

Rosetta has a relatively long academic history, and there is a substantial set of papers that are foundational to both the content of the code-base and the accomplishments of its users. We distilled these references to a core canon: the papers we assume each other to have read, the papers we wish we had read, the papers we should have read, and, in lucky cases, the papers we have read.

These are organized by field in order of Rosetta entering the field, and chronological order within each group.

A more complete (but probably not 100% complete) list can be found here.

Protein Structure Prediction


See also Scorefunction History#publications timeline

  • Rohl CA, Strauss CE, Misura KM, Baker D (2004) Protein structure prediction using Rosetta. (pubmed link)
    Methods in Enzymology.
    This paper, often called the Rohl review, is a window into Rosetta's early scorefunction, and remains an excellent reference for early forms of the score function terms. It can be a little hard to find online, but paper photocopies float around most Rosetta labs.

  • Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D (2003)
    Design of a novel globular protein fold with atomic-level accuracy. (pubmed link)
    Science 302:1364-8
    This paper, often called the Top7 paper, is primarily a design paper (see below), but is important for scoring as it introduces sequence-related energy terms (reference energies, p_aa_pp, etc). The supplemental is most relevant for scoring.

  • Leaver-Fay A, O'Meara MJ, Tyka M, Jacak R, Song Y, Kellogg EH, Thompson J, Davis IW, Pache RA, Lyskov S, Gray JJ, Kortemme T, Richardson JS, Havranek JJ, Snoeyink J, Baker D, Kuhlman B (2013)
    Scientific benchmarks for guiding macromolecular energy function improvement. (pubmed link)
    Methods Enzymol 523:109-43
    Leaver-Fay et al. describe OptE, a methodology for using sequence-recovery and rotamer-recovery benchmarks to improve weights sets for scoring functions. This was used in a separate paper to generate Talaris2013/4, the current state-of-the-art general purpose Rosetta energy function.

  • REF2015: Park H, Bradley P, Greisen Jr. P, Liu Y, Mulligan VK, Kim DE, Baker D, DiMaio F (2016) Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. (pubmed link) J Chem Theory Comput. 2016;12(12):6201–6212. PubMed PMID 27766851

  • Review: Alford RF, Leaver-Fay A, Jeliazko JR, O'Meara MJ, DiMaio FP, Park H, Shapovalov MV, Renfrew PD, Mulligan VM, Kappel K, Labonte JW, Pacella MS, Bonneau R, Bradley P, Dunbrack RL, Das R, Baker D, Kuhlman B, Kortemme T, Gray JJ (2017) The Rosetta all-atom energy function for macromolecular modeling and design.(acs link) J Chem Theory Comput. 2017;13(6):3031-3048



  • Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D (2003)
    Design of a novel globular protein fold with atomic-level accuracy. (pubmed link)
    Science 302:1364-8
    Kuhlman et al. demonstrate the computational design of a protein with an entirely novel fold. The significance of this result arises because prior design efforts involved stabilizing or modifying existing, known folds; the ability to de novo create a protein was wholly original.

  • Guntas G, Purbeck C, Kuhlman B (2010)
    Engineering a protein-protein interface using a computationally designed library. (pubmed link)
    Proc Natl Acad Sci U S A 107:19296-301
    Computational design at twenty amino acid positions was used to semi-direct two protein libraries, which greatly outperformed the control library where all residues were allowed: the former produced multiple mid-nanomolar binders while the latter could not produce bunders under fifty micromolar after four rounds of selection.

Mutational analysis

Ligand docking

Loop modeling


Enzyme Design

Rosetta development

  • Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban YE, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popović Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011)
    ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. (pubmed link)
    Methods Enzymol 487:545-74
    This is the Rosetta3 paper that describes the transition from C++-but-monolithic Rosetta++ to object-oriented-C++ Rosetta3. It introduces many of the major modern classes.

  • Cooper S, Khatib F, Treuille A, Barbero J, Lee J, Beenen M, Leaver-Fay A, Baker D, Popović Z, Players F (2010)
    Predicting protein structures with a multiplayer online game. (pubmed link)
    Nature 466:756-60
    Cooper et al.'s development of the FoldIt game showed that highly parallel human intuition is a useful tool for, well, folding it (proteins).

  • Chaudhury S, Lyskov S, Gray JJ (2010)
    PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. (pubmed link)
    Bioinformatics 26:689-91
    Chaudhury et al. developed a Python based scripting interface to Rosetta functionality (PyRosetta), permitting users to easily compose protocols without interacting with the C++ layer.

  • Sarel J. Fleishman , Andrew Leaver-Fay, Jacob E. Corn, Eva-Maria Strauch, Sagar D. Khare, Nobuyasu Koga, Justin Ashworth, Paul Murphy, Florian Richter, Gordon Lemmon, Jens Meiler, David Baker (2011)
    RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite (pubmed link)
    PLoS ONE 6(6): e20161
    Fleishman et al. develop an XML-like interface to directly access protocol-level functionalities (and as such requires a compiled version of C++ Rosetta). RosettaScripts allows the user to circumnavigate coding in C++, which in turn permits for the rapid development of new protocols.


  • Drew K, Renfrew PD, Craven TW, Butterfoss GL, Chou FC, Lyskov S, Bullock BN, Watkins A, Labonte JW, Pacella M, Kilambi KP, Leaver-Fay A, Kuhlman B, Gray JJ, Bradley P, Kirshenbaum K, Arora PS, Das R, Bonneau R (2013)
    Adding diverse noncanonical backbones to rosetta: enabling peptidomimetic design. (pubmed link)
    PLoS One 8:e67051
    Drew et al. create custom modes for backbone sampling to incorporate peptidomimetic scaffolds into Rosetta, enabling the design of amino acid-derived nonnatural foldamers.

  • Renfrew PD, Craven TW, Butterfoss GL, Kirshenbaum K, Bonneau R (2014)
    A rotamer library to enable modeling and design of peptoid foldamers. (pubmed link)
    J Am Chem Soc 136:8772-82
    Renfrew et al. demonstrate that peptoid residues (N-alkylated or arylated glycines) have sidechain conformations that fall neatly into rotamer bins much like peptides and demonstrate two methods for constructing rotamer libraries to model them.

See Also