Author: Chris King (dr.chris.king@gmail.com)
Last updated 4/13/2010; P.I. Phil Bradley (pbradley@fhcrc.org)
This applications live in src/apps/public/pepspec/pepspec.cc and src/apps/public/pepspec/pepspec_anchor_dock.cc . The demo lives in demo/pepspec . The integration tests lives in test/integration/tests/pepspec_anchor_dock and test/integration/tests/pepspec .
C.A. King and P. Bradley, Structure-based prediction of protein–peptide specificity in rosetta, Proteins 78 (2010), pp. 3437–3449.
Many cell signalling events and protein-protein interactions are mediated by peptide binding domains and short, linear peptide motifs. The pepspec application can be used for structure-based prediction of protein-peptide specificity of peptide-binding proteins. For the primary pepspec application, the user must supply a model of the peptide-binding protein of interest bound to either a peptide or a single peptide residue. If such a model is not available, a single residue-docked conformation can be generated using the pepspec_anchor_dock application first. The pepspec_anchor_dock application must be supplied with a model of the unbound target protein and one or more homologous protein-peptide complex structures to estimate an initial conformation for a single peptide "anchor" residue on the surface of the target protein. Pepspec uses this input protein-peptide configuration a starting place to perform flexible-backbone peptide design on the surface of the protein, generating a large number of putative peptide ligands. These peptides may then be ranked by predicted binding affinity to produce a position-specific scoring matrix for the target protein.
The pepspec application implements an anchored, flexible-backbone peptide docking and design algorithm in which the sequence and structure of the peptide are simultaneously optimized. Rather than performing global peptide docking searches, pepspec requires as input an approximate location for a key "anchor" residue of the peptide; the remainder of the peptide is assembled from fragments as in de novo structure prediction and refined with simultaneous sequence optimization. Backbone flexibility of the protein is optionally incorporated implicitly by docking into a structural ensemble for the protein partner.
This application is NOT for structure prediction of an entire protein. You need to have a model of the peptide-binding protein, although this model may be derived from experiment, homology modeling, or de novo protein folding. This applcation does NOT move the backbone of the input protein structure. Backbone ensembles can be generated with the backrub or relax applications. This application does NOT support de novo docking of the peptide anchor residue; you need to have at bare minimum a model of a protein-peptide complex homologous to your target protein. To dock a single residue with no knowledge of where the binding pocket might be, you may consider using the docking application.
This application has two major modes: Anchor Docking and Peptide Design. Anchor Docking: If you already have a structure of the target protein bound to an N-mer peptide, you may not need to do this step. If you need to dock an anchor residue onto your protein, then the anchor docking mode allows you to use structures of homologous protein-peptide complexes to predict the position of the anchor residue on your target protein. You provide a single structure of your target protein or an ensemble of structures, along with a set of homologous complexes. The homologues must be aligned to the target protein! The algorithm uses the relative positions of the homologues’ anchor residues to dock a new anchor residue to your target protein, and outputs the structures and associated score data for use in the next step. Peptide Design: In the peptide design phase, putative binding peptides are designed at the surface of the target protein. The algorithm takes as input one or more protein-peptide complexes. The "peptide" may be a single residue docked in the previous phase. The existing peptide is optionally extended from each termini by a user-defined number of residues, and low-resolution backbone sampling takes place before high-resolution peptide sequence design. The low resolution step uses a full-atom (not centroid) poly-A or poly-G peptide with a minimal score function that only penalizes atomic clashes and insures the peptide remains near the surface of the protein. The design phase attempts full combinatorial sequence design with both soft repulsive atoms and then with full repulsive atoms, followed by minimization. Then, the sequence is diversified using a Monte-Carlo+minimization design phase. In this diversification stage, random point mutations are made to the peptide, surrounding sidechains conformations are optimized, and the point mutation is accepted or rejected stochastically with probability based on the change in the prptide's estimated binding score. The binding score is calculated by subtracting the rosetta energy of the unbound peptide with fixed backbone and repacked sidechains from the total protein-peptide complex rosetta energy. (Note: total rosetta energy may be used instead by supplying the flag "-pepspec:binding_score false".) In this way, each peptide backbone generates many different peptide sequences. Sequence-score data is output for post-processing, and protein structures may also be optionally saved.
<homolog_pdb_filename> <peptide_chain> <peptide_anchor_res>
for each homolog. It is highly recommended you perform the structural alignment of the homologues to your target structure ahead of time. This is necessary to insure that the homologue complex peptides' coordinates are properly superimposed in your target protein's reference frame. As long as peptide backbone coordinates can be gleaned from the homologues, all other aspects of the homologues' PDB model quality are irrelevant. You can optionally choose for Rosetta to attempt a sequence alignment and subsequent structural alignment of the homologue proteins to your target protein (-pepspec:seq_align), but the alignment may not be ideal. I recommend using Cealign .
<atom_name> <peptide_position> <x_coord> <y_coord> <z_coord> <0.0> <std_dev> <tolerance>
You will probably only need to use General and Typical Options. These options will make more sense after you read the Tips section below.
-option:name [data_type] - this is a description (default_value)
This application produces protein-peptide structures and scorefiles. The scorefiles may be used to generate sequence-specitificity position-weight matrices by using the scripts described below. Note: pepspec will automatically generate folders for the output structures named "<output_tag>.pdbs"
A position-weight matrix (PWM) can be generated from pepspec output using the script gen_pepspec_pwm.py found in (ROSETTA_LOCATION)/analysis/apps. This script will sort all peptide sequences by Rosetta binding score and generate a matrix of peptide positions by residue frequencies. A background PWM can optionally be supplied to normalize the raw pepspec output PWM (see References at the top of this document). Run 'gen_pepspec_pwm.py help' for more information.
PWMs may also be generated and visualized using 3rd-party software such as WebLogo .
Note: The supplied background PWM is valid for normalizing PWMs generated with the standard.wts Rosetta score weights! Use of different score weights will necessarily perturb residue frequencies in the background PWM.
This is the first release of these applications.