Authors: Gideon Lapidoth (glapidoth@gmail.com), Sarel Fleishman (sarel.fleishman@weizmann.ac.il )
Corresponding PI Sarel Fleishman (sarel.fleishman@weizmann.ac.il ).
Last edited 10/26/2019 by Gideon Lapidoth (glapidoth@gmail.com)
Methods for antibody structure prediction rely on sequence homology to experimentally determined structures. Resulting models may be accurate, but they are often stereochemically strained, limiting their usefulness in modeling and design workflows. Instead of using sequence homology, AbPredict conducts a Monte Carlo based search for low-energy combinations of backbone conformations, derived from experimentally solved antibody structures, to yield accurate and unstrained antibody structures. ABpredict uses a combinatorial backbone optimization algorithm, which leverages the large number of experimentally determined molecular structures of antibodies to construct new antibody models. Briefly, all the experimentally determined antibody structures are downloaded from the Protein Data Bank (PDB) and segmented along structurally conserved positions: the disulfide cysteines at the core of the variable domain's light and heavy chains, creating 4 segments comprising of CDR's 1&2 and the intervening scaffold (VH and VL) and CDR 3 (H3 and L3). These four segments are then recombined combinatorially to produce a highly conformationally diverse set of novel antibody models. The input sequence for modeling is then thread onto the starting set of models. The models are then energetically optimized using Monte-Carlo sampling. At each step a random segment conformation is sampled from a pre-computed database (See SpliceOutAntiBody). The final models are then ranked by energy.
AbPredict is implemented as a rosetta scripts protocol. An example of this protocol, and other necessary files such the torsion database files can be found here: <Rosetta_Directory>/demos/tutorials/AbPredict/
-parser:script_vars sequence=IKMTQSPSSMYASLGERVTITCKASQDIRKYLNWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIQLQQSGAELVRPGALVKLSCKASGFNIKDYYMHWVKQRPEQGLEWIGLIDPENGNTIYDPKFQGKASITADTSSNTAYLQLSSLTSEDTAVYYCARDNSYYFDYWGQGTTLTVS
Note the following rules concerning the input sequence:
There should be exactly 9 aa's after the conserved H3 trp (H103 in Chothia numbering)
\<Rosetta_Directory\>/demos/tutorials/AbPredict/create_run.sh
To run:
./create_run.sh <VL length> <L3 length> <HL length> <L3 length>
You should run in the script within the folder \<Rosetta_Directory\>/demos/tutorials/AbPredict/
-parser:script_vars entry_H1_H2="1AHWH" entry_L1_L2="1AHWL" entry_H3="1AHWH" entry_L3="1AHWL"
Each line in the file corresponds to one modeling process. 500 models is usually sufficient. You can increase the number of output models by running more "ntrials" in each job or creating more jobs by changing the number of lines from 500 to N in the create_run.sh
file.
An example Rosetta modeling job would look like this:
<Rosetta_Directory>/main/source/bin/rosetta_scripts.default.linuxgccrelease @flags -parser:script_vars entry_H1_H2=2IBZX entry_L1_L2=3DSFL entry_H3=3V4UH entry_L3=1LO2L sequence=IKMTQSPSSMYASLGERVTITCKASQDIRKYLNWYQQKPWKSPKTLIYYATSLADGVPSRFSGSGSGQDYSLTISSLESDDTATYYCLQHGESPYTFGGGTKLEIQLQQSGAELVRPGALVKLSCKASGFNIKDYYMHWVKQRPEQGLEWIGLIDPENGNTIYDPKFQGKASITADTSSNTAYLQLSSLTSEDTAVYYCARDNSYYFDYWGQGTTLTVS
Flags file:
-nodelay
-use_input_sc
-ignore_unrecognized_res
-overwrite
-out:file:fullatom
-s 2BRR.ppk_ideal.pdb
-parser:protocol AbPredict_xsd.xml
-parser:script_vars template_pdb=2BRR.ppk_ideal.pdb
-pdb_comments true
-parser:script_vars H1_H2.db=AB_db_files/H1_H2.db H3.db=AB_db_files/H3.db L3.db=AB_db_files/L3.db L1_L2.db=AB_db_files/L1_L2.db
-out:path:pdb pdb/