Page added 22 October 2018 by Julia Koehler Leman.
Scientific Benchmarks are tests that compare Rosetta generated structure predictions with experimental observations. Assessing the accuracy of Rosetta predictions will
Scientific benchmarks are meant to measure the physical realism of the energy function and how well a protocol is at sampling physically realistic structures.
Several tests are located in the Rosetta/main/tests directory
. The directory structure is the following:
Rosetta/main/tests/integration
contains integration testsRosetta/main/tests/benchmark
contains files required for the benchmark test server, i.e. the framework which runs the scientific tests, you might have to look them over when debuggingRosetta/main/tests/scientific
contains the scientific testsRosetta/main/tests/scientific/tests
contains the implementations of the tests, one directory per testRosetta/main/tests/scientific/data
submodule that contains the input data if >5 MB per testRosetta/main/tests/scientific/tests/_template_
template directory with all necessary input filescd Rosetta/main/tests/scientific
git submodule update --init --recursive
to get the submodule containing the input data. You will now see a Rosetta/main/tests/scientific/data
directorycd ..
) and commit your changes againgit status
you will notice when git complains about uncommitted files]cd Rosetta/main/tests/scientific/tests
cp -r Rosetta/main/tests/scientific/tests/_template_ <my_awesome_test>
cd <my_awesome_test>
: tests are set up by individual steps numbered sequentially so that they can be run individually without having to rerun the entire pipeline. Don’t run anything yet (we’ll get to that further below). ==> EDIT HERE
tags.
0.compile.py
: this script is for compilation, likely you won’t need to edit this file1.submit.py
: this contains the command line and the target proteins you are running. The debug option does not refer to the release vs debug run in Rosetta but rather to the debug version of running the scientific test for faster setup. 2.analyze.py
: this script analyzes the results from the score files or whatever files you are interested in. It reads the cutoffs file and compares this run’s results against them. It will write the result.txt
file for easy reading. A few basic functions for data analysis are at the bottom of this script. If you need your specialized function, please add it there. 3.plot.py
: this script will plot the results via matplotlib into plot_results.png
with subplots for each protein. It draws vertical and horizontal lines for the cutoffs. 9.finalize.py
: this script gathers all the information from the readme, the results plot and the results and creates a html page index.html
that displays everythingcitation
filecutoffs
file, header line starts with #
and has one protein per row, different measures in columns observers
readme.md
in as much detail as you can. Keep in mind that after you have left your lab, someone else in the community should be able to understand, run and maintain your test! This is automatically linked to the Gollum wiki. run
function at the bottom of Rosetta/main/tests/benchmark/tests/scientific/command.py
to hook it into the test server frameworkjson
format, but plots are highly encouraged as digesting data visually is faster and easier for us for debugging. json
format because it’s easy to read the data directly into python dictionaries. python3
. Every python script in the scientific tests will need the python3
prefix to run them properly!Setup a run on Multiple Cores
main/tests/benchmark/benchmark.ini.template
to benchmark.linux.ini
(or whatever your architecture is). Adjust the settings in this file (i.e. cpu_count
and memory
) as appropriate for your environment. If hpc_driver = MultiCore
, this will submit jobs up to cpu_count
without using an HPC job distributor.Setup the run
cd Rosetta/main/tests/benchmark
*python3 benchmark.py --compiler <clang or else> --skip \
--debug scientific.<my_awesome_test>
--skip
flag is to skip compilation, only recommended if you have an up-to-date version of master compiled in release mode (Sergey advises against skipping)--debug
flag is to run in debug mode which is highly recommended for debugging (i.e. you create 2 decoys instead of 1000s)Rosetta/main/tests/benchmark/results/<os>.scientific.<my_awesome_test>
where it creates softlinks to the files in Rosetta/main/tests/scientific/tests/<my_awesome_test>
and then it will likely crash in one way or anotherStart Debugging
cd Rosetta/main/tests/benchmark/results/<os>.scientific.<my_awesome_test>
and debug each script individually, starting from the lowest number, by running for instance python3 1.submit.py
config.json
which contains the configuration settingsoutput
directory is created that contains the subdirectories for each proteinhpc-logs
directory is created that contains the Rosetta run logs. You might have to check them out to debug your run if it crashed in the Rosetta run step. .json
file that contains the variables you want to carry over into the next step9.finalize.output.json
contains all the variables and results savedplot_results.png
with the resultsindex.html
with the gathered results, failures and details you have written up in the readme. While all the files are accessible on the test server later, this file is the results summary that people will look atoutput.results.json
will tell you whether the tests passed or failedOnce you are finished debugging locally, commit all of your changes to your branch
nstruct
for debugging your run on the test server. If you do that, don’t forget to increase it later once the tests run successfullyOnce the tests run as you want, merge your branch into master
scientific
branch is an extra branch that grabs the latest master version every few weeks to run all scientific tests on. DO NOT MERGE YOUR BRANCH INTO THE SCIENTIFIC BRANCH!!!
Celebrate! Congrats, you have added a new scientific test and contributed to Rosetta’s greatness. :D
Frequently, a scientific test will aim to evaluate the quality of a folding funnel (a plot of Rosetta energy vs. RMSD to a native or designed structure). Many of the simpler ways of doing this suffer from the effects of stochastic changes to the sampling: the motion of a single sample can drastically alter the goodness-of-funnel metric. For example, one common approach is to divide the funnel into a "native" region (with an RMSD below some threshold value) and a "non-native" region (with an RMSD above the threshold), and to ask whether there is a large difference between the lowest energy in the "native" region and the lowest in the "non-native" region. A single low-energy point that drifts across the threshold from the "native" region to the "non-native" region can convert a high-quality funnel into a low one, by this metric.
To this end, the PNear metric was developed. PNear is an estimate of the Boltzmann-weighted probability of finding a system in or near its native state, with "native-ness" being defined fuzzily rather than with a hard cutoff. The expression for PNear is:
Intuitively, the denominator is the partition function, while the numerator is the sum of the Boltzmann probability of individual samples multiplied by a weighting factor for the "native-ness" of each sample that falls off as a Gaussian with RMSD. The expression takes two parameters: lambda (λ), which determines the breadth of the Gaussian for "native-ness" (with higher values allowing a more permissive notion what is close to native), and kB*T, which determines how high energies translate into probabilities (with higher values allowing states with small energy gaps to be considered to be closer in probability). Recommended values are lambda = 2 to 4, kB*T = 1.0 (for ref2015) or 0.63 (for talaris2013).
For more information, see the Methods (online) of Bhardwaj, Mulligan, Bahl et al. (2016). Nature 538(7625):329-35.
Update: As of 10 October 2019, a Python script is available in the tools
repository (in tools/analysis
) to compute PNear. Instructions for its use are in the comments at the start of the script. In addition, the function calculate_pnear()
, in Rosetta/main/tests/benchmark/util/quality_measures.py
, can be used to compute PNear.
Please use this template to describe your scientific test in the readme.md
as described above. Also check out the fast_relax
test for ideas of what we are looking for.
## AUTHOR AND DATE #### Who set up the benchmark? Please add name, email, PI, month and year
## PURPOSE OF THE TEST #### What does the benchmark test and why?
## BENCHMARK DATASET #### How many proteins are in the set? #### What dataset are you using? Is it published? If yes, please add a citation. #### What are the input files? How were the they created?
## PROTOCOL #### State and briefly describe the protocol. #### Is there a publication that describes the protocol? #### How many CPU hours does this benchmark take approximately?
## PERFORMANCE METRICS #### What are the performance metrics used and why were they chosen? #### How do you define a pass/fail for this test? #### How were any cutoffs defined?
## KEY RESULTS #### What is the baseline to compare things to - experimental data or a previous Rosetta protocol? #### Describe outliers in the dataset.
## DEFINITIONS AND COMMENTS #### State anything you think is important for someone else to replicate your results.
## LIMITATIONS #### What are the limitations of the benchmark? Consider dataset, quality measures, protocol etc. #### How could the benchmark be improved? #### What goals should be hit to make this a "good" benchmark?