Antibody design analysis consists of two parts: analyze_antibody_design_strategy.py
and compare_antibody_design_strategies.py
These will eventually be moved to Rosetta/tools/antibody/analysis
Requirements include:
Clustal Omega
All Decoys should be uniquely named
A basic understanding of how to edit your shell profile (if using zsh, this can be edited via $HOME/.zshrc. If you are using Bash on linux, the profile is at $HOME/.bashrc if mac, then $HOME/.bash_profile). Remember, no spaces before or after any equal signs. More information on this can be found throughout the web. this for example.
Download PyRosetta from https://www.pyrosetta.org or compile it from C++ Rosetta. Add the PyRosetta directory to your path by either running SetPyRosettaEnvironment.sh or by adding this to your bashrc/zshrc or whatever shell profile you are using so that SetPyRosettaEnvironment is run every time you open a new shell (source path/to/SetPyRosettaEnvironment.sh). This is preferred. This will enable import of PyRosetta and associated PyRosetta Toolkit modules located in pyrosetta/apps/pyrosetta_toolkit.
Biopython is used to load structures quicker then PyRosetta to output combined FASTA files for running Clustal Omega. PyRosetta is great, but memory and speed are important when you have hundreds or thousands of structures.
To install Biopython, run setup_biopython.sh
located in the directory containing the analysis scripts
Rosetta should be compiled. Rosetta will be used to run the FeaturesReporters through RosettaScripts. The Rosetta/main/bin directory should be added to your path in your shell profile, and you should set the Rosetta database. This will make it so that you do not need to give the database path while running Rosetta. export ROSETTA3_DB=path/to/Rosetta/main/database
The Rosetta Tools repository should be cloned (for developers) or downloaded with the rest of Rosetta.
The path to this repository should be added to your pythonpath in your shell profile. For example, I add this line to my zshrc file: export PYTHONPATH=$PYTHONPATH:/Path/to/Rosetta/tools
You will need to checkout the antibody_tools branch until I merge this into git.
The last thing you need to run all this is clustal Omega. This is to create alignments between you top designs, and between your strategies. See http://www.clustal.org/omega/ for download instructions. The clustal binary executable should be renamed clustal_omega, and the directory in which it resides should be added to your path, for example: export PATH=$PATH:/path/to/Rosetta/bin
This script, analyze_antibody_design_strategy.py
analyzes one strategy at a time. You should run it from the directory it is in. Use ./analyze_antibody_design_strategy.py --help
to get an idea of the options available. Generally, it should only presently be used for the creation of the FeaturesReporter databases. These will have an analysis of DGs, energies, hbonds, cdr lengths, etc for each PDB in the strategy. It is useful to run the code via a shell script for all the strategies you have available, as these can be backgrounded to analyze multiple strategy runs simultaneously.
Input can either be a txt file with a list of PDBs, or a directory of PDBs.
--PDBLIST path/to/pdblist
--indir path/to/pdbs
--rosetta_extension linuxclangrelease for example
--outdir
--out_name
--out_db_batch
--score_weights talaris2013
You can run all analysis by passing the --do_all
option, however, this is not recommend as rescoring via PyRosetta takes a very long time. We will use the scores generated and output into the features database, as well as output tops, write pymol sessions, etc. during our comparison. It is recommended to run the antibody_features reporters on all of your models, but you can run them on only top scoring. See -help for more scoring options. This is not recommended, as you will need to rescore all the models before the run.
Recommended:
--do_run_antibody_features_all
Not Recommended:
--do_run_antibody_features_top
--do_run_cluster_features_all
--do_run_cluster_features_top
This application compares and outputs alignments, scores, and sessions from pre-analyzed antibody designs. Each strategy should have a features database associated with it.
Launch in the directory where the program exists, optionally passing the full or relative path to the main analysis directory. The program will look for a directory called /databases within that directory. This is created using the analysis script. If no path is given, you can set the path in the File menu.
./compare_antibody_design_strategies.py traztuzimab/strategy_analysis
Double click a strategy on the left to move it to the right box to analyze it. Double click a strategy from the right to remove it from the list.
Select Run Antibody Features to run the set of R scripts to plot all available data and compare strategies. This may take a fairly long time. Cluster Features are used for recovery if a reference database is set in the file menu. It is used mainly for benchmarking.
Rosetta/tools/antibody/analysis/antibody_features.json
. This file has the list of all the R scripts we will run for the analysis. Remove any hbond scripts you see, especially intra-cdr hbonds. Make sure to be in a different branch if you are a developer. We will add an option to include or exclude hydrogen bond calculations while running the features comparisons shortly. Here you can set a new analysis directory or add a new strategy. If you have a reference PDB, it is useful to set it here. If using a different scorefunction than talaris2013, it should be set here as well. It is used to search for the database of interest. You can also set the Top N. This is used throughout the program and the default is 10. Additionally, if you have a reference database (perhaps 100 relax models of the native decoy analyzed into a features database using analyze_antibody_design_strategies, it will set here and labeled as ref)
Copy All Decoys
This runs Clustal Omega. You can run them on the top of each score type (currently total_score, dG, dSASA), or ALL of your decoys. You will get a chance to add additional clustal options when you select the button to run.
Set Output Format
Set Soft Wrap
Eventually, we will output Clustal alignments on only the CDRs of interest and or the framework.
Alignments
Enrichment
These are accessed by the File menu. You can set dG, dSASA, and total score filters that will be used across the whole program (except for the Antibody Features R scripts). You can also set custom filters that use sqlite3 syntax. This is useful to exclude certain lengths or a minimum number of antigen contacts.