See Scientific Benchmarks for information about the current set of scientific benchmarks
Scientific Benchmarks are tests that compare Rosetta generated structure predictions with experimental observations. Assessing the accuracy of Rosetta predictions will
Scientific benchmarks are meant to measure the physical realism of the energy function and how well a protocol is at sampling physically realistic structures.
There are two types of scientific benchmarks:
Biweekly Scientific Benchmarks:
Cluster Scientific Benchmarks (Energy Function Scientific Benchmarks): These are meant to test the quality of an energy function across a broad set of tests that represent realistic scientific prediction protocols. They are intended to be run by a researcher who wants to test a candidate energy function for becoming the standard energy function in Rosetta.
If your test requires you to create a new application, please commit it to this location: source/src/apps/benchmark/scientific/
1) Create a folder in main/test/scientific/tests/<test_name>.
Each individual test can have the following subdirectories:
2) Collect a small sample of experimental data such as crystal structures, NMR data for the new test.
3) Write a script called 'command' that generates structure predictions and/or processes the input structures and experimental data to generates raw results.
4) Write another script to analyze the results from step 3) to generate numerical statistics and/or plots. Have this analysis script write a file .results.yaml, which contains valid yaml text and have at least one boolean field, '_isTestPass'. Here is an example
{ 'rotamer_recovery_rate_at_all_positions' : .37, '_isTestPass' : True }
With this file, when tests are run on the Testing Server, the results can be processed and reported on the testing web page.
Also have the analyze script write a file .results.log, to contain human-readable text-based results. The contents of this file will be appended to the Nightly Tests email sent to the Rosetta-logs mailing list.
To run the local scientific benchmarks, go to test/scientific and execute
scientific.py -d path/to/database -j<num_processors>
This will generate for each test a directory test/scientific/statistics/<test_name> containing protocol log information, statistics and plots. Each test should take about an hour to run. Additionally, each test is run regularly on the Testing Server with possibly more extensive input or a more extensive protocol.
1) Put together a more extensive input dataset or put together parameters for more a extensive protocol that can be used when the test is run on the Testing Server cluster at JHU.
2) Add input data to mini.data as needed.
3) Tell sergey to create the symbolic link for the test so it will automatically run on the Testing Server
4) Monitor the results of the testing server and verify that the new test is running correctly.
Creator Name: Matthew O'Meara
Prediction Task / Description: Given the native conformation of a whole structure except for a single sidechain, predict the conformation of the sidechain. Rotamer Recovery Benchmark
Experimental Data Used: The input structures is a selection of the Richardson's Top5200 dataset. The Top5200 set was constructed as follows. Daniel Keedy and others the Richardson Lab clustered all structures in the PDB on April 5th 2007 into 70% sequence homology clusters. Each structure with resolution 2.2A or better and was not filtered by hand was run through MolProbity. Then from each group the structure with the best average resolution and MolProbity score was selected provided it had resolution at least 2.0A. All structures from the Top5200 having between 50 and 200 residues and resolution less than 1.2A were selected for this benchmark. This leaves 152 structures with 17463 residues.
How to interpret the results: Using the ChiDiff comparison, the score is the maximum over the angle differences for each of the torsional angles. A residue is considered recovered if all angles differences are less than 20. The reported scores include:
Significance is determined by a permutation test or bootstrap analysis.
Preliminary Results: See Leaver-Fay et al., Methods in Enzymology 2013
Caveats: ...
Approximate computational cost: ...
How to run: ...
Creator Name: This will be a joint effort. Shane O'Connor, Kortemme Lab will admin the benchmarking system.
Prediction Task / Description: Given experimental DDG measurements in protein stability assays, predict the DDG in Rosetta.
Experimental Data Used: Our set of experimental data mainly comes from the extensive ProTherm database, Kyushu Institute of Technology, Japan. A smaller number of records come from Guerois et al. and a paper on Rosetta DDG protocols by Liz Kellogg, Andrew, and David Baker. At present, we have three curated sets which take different subsets (sometimes using average values over multiple experiments with the same protein/mutation) of the total set of DDG values. These curated sets were compiled by Guerois et al., Kellogg et al., and Potapov et al.. Other curated sets exist in the literature which we hope to incorporate in time.
We will define our own curated datasets for the shootout. Details will be posted when the sets are concretely defined although I expect there to be a large overlap with the datasets above. The Guerois set has just under 1,000 different point mutations, the Kellogg set has @1,200, the Potapov set has @2,100. The entire set of point mutations and associated DDG values stored in our database is around the order of 4,000.
How to interpret the results:
Analysis is run using a Python package. At present, we have basic analysis scripts which generate scatter plots of experimental vs predicted values and compute the Pearson correlation coefficient and mean absolute error values. More refined analysis (large-to-small mutations, mutations to x, etc.) as well as more statistical measures will be added.
Preliminary Results: None as yet.
Caveats: Currently requires an SGE cluster to run (see below).
Approximate computational cost: No useful data at present. We currently run predictions using protocol 16 from the Kellogg et al. paper on a Sun Grid Engine cluster. We measure time from job start in our scheduler to job completion which includes the cluster wait time as well as computational time. This could be refined. As well as score function changes, we aim to test different protocols for the shootout so times may vary.
How to run:
We run the jobs using a Python scheduler which communicates with a MySQL database and submits jobs to the SGE cluster. The scheduler takes care of intermediate steps (file-passing and command-line argument values between steps in the protocol). This is not currently portable to labs who do not have access to an SGE cluster so this backend be abstracted in the future. Analysis is run using a simple API (in Python) to the database, calling R to generate PDF output.
To avoid the necessity of using a MySQL database, one possibility is to dump the relevant data to an SQLite database which would be easier to set up. However, this would require adding a database abstraction layer into our code which is currently not present.
Fragment sets, or infile sets, live in the BENCH_HOME/fragments directory. Each fragment set has to be in a separate directory. A benchmark type specific directory structure is neccessary in these fragment set directories. See below for details. Once the directories are setup, the fragment set has to be imported into the database. This is (currently) done each time anybody loads the fragment overview page (accessible from the main menu) - only the fragment set is imported. Edit the newly imported fragment set - it needs a description and a benchmark type. Once the fragment set is setup, import the fragments by the command bench.pl -mode fragment -submode insert_fragmentset -id <id> where <id> is the fragmentset id.
A fragment set, say abinitio_set01 (BENCH_HOME/fragments/abinitio_set01) has a 5-letter subdirectory for each target. In the CVS repository, there is an example, 1a32_.tbz. To setup, create the directory BENCH_HOME/fragments/abinitio_set01 and untar the file (tar -xvjf 1a32_.tbz) in that directory.