Fragments are used in the assembly of proteins whether for structure prediction or design, to cut down on the size of the protein-folding search space. They are a core part of the Rosetta design. : Fragment libraries are used by many protocols but are a core part of ab initio.
Fragment libraries follow a complex naming scheme:
{series}{pdb}{chain}{size}_{strategy}.{depth}_{version}
Terms in fragment filenames Terms in fragment filenames:
series The two character code used to disambiguate Rosetta runs. For fragment libraries this is almost always aa.
pdb The four character code of the PDB the fragments were generated for.
chain The one character code of the protein chain the fragments were generated for. Usually "_" for all.
size Fragment size, either a 3-mer or 9-mer. Acceptable values are 03 and 09.
strategy Fragment selection strategy used in NNMake. Acceptable values are 04, 05 and 06. See below.
depth Number of fragments of descending score which are kept in the library, usually 200.
version Version of NNMake, usually v1_3.
Fragment files typically have the following format (Referred to a "rosetta++ format")
position: 1 neighbors: 200
{fragment data}
position: 2 neighbors: 200
{fragment data}
Where "position" is the pose number of the starting point of the fragment, and "neighbors" is the number of fragments (ignored on reading).
"fragment data" consists of blank-line separated blocks of lines. Each block represents a fragment, and the number of lines in the blocks matches the size of the fragment.
Each line in the fragment data typically looks like the following:
2oqo A 189 I L -140.176 157.939 -179.962 -0.776 5.007 51.121 3 0.000 P 1 F 1
The format is column based
Column -- Meaning
1 -- blank
2-5 -- PDB code for the fragment origin
7 -- chain ID for the origin PDB
9-13 -- PDB residue number for the origin PDB
15 -- amino acid identity in the origin PDB
17 -- secondary structure for the origin PDB (Helix, Loop, Extended/beta)
19-27 -- phi
28-36 -- psi
37-45 -- omega
46-54 -- C-alpha x coordinate for origin PDB (optional)
55-63 -- C-alpha y coordinate for origin PDB (optional)
65-73 -- C-alpha z coordinate for origin PDB (optional)
74-79 -- unknown (unused)
80-85 -- unknown (unused)
86 -- Literal "P" (unused)
87-89 -- fragment position number, pose numbered (unused)
91 -- Literal "F"(unused)
92-94 -- fragment number (unused)
Everything after omega is ignored/discarded in modern Rosetta runs, and may not even be present in all version of the fragment file.
Making Fragments by yourself: DATABASES: nr - downloadable from ftp://ftp.ncbi.nih.gov/blast/db/external link nnmake_database included in release. chemshift_database include in release. PROGRAMS: PSI_BLAST PSIPRED JUFO PROFphd SAM nnmake include in release chemshift include in release Configure paths at the top of nnmake/make_fragments.pl to point to these databases and programs. PSI-BLAST must be installed locally After PSIBLAST and PSIPRED are installed, refer to its README or see quick directions below on how to create a filtered "NR" seqeuence data bank, called "filtnr", which is also used by make_fragments.pl. Quick directions for creating filtnr:
tcsh81538cfilt nr.fasta > filtnr
tcsh 0.000000ormatdb -t filtnr -i filtnr
tcsh p filtnr.p?? $BLASTDB
Run make_fragments.pl. Invoke without arguments for usage options. Likely the only argument you need to provide is the fasta file.
$> make_fragments.pl -verbose 2ptl_.fasta
If you want to exclude homologous seqeunces from the fragment search, add the -nohoms argument. \$> make_fragments.pl -verbose -nohoms 2ptl_.fasta Note that if you want to exclude homologs from the chemical shift/TALOS search, you need to edit the talos database. See the README in the chemshift_source directory for instructions. If you do not have a particular type of secondary structure prediction (say the .jufo file) and you do NOT want make_fragments to try to run the method locally, use the -nojufo option.
$> make_fragments.pl -verbose -nohoms -nojufo 2ptl_.fasta
Two fragment files will be generated with names like aa2ptl_03_05.200_v1_3 and aa2pt_09_05.200_v1_3. The prefix "aa" can be changed by -xx option. "2ptl_" is the five-letter base name which can be specified by -id option or it is derived from the name of fasta file. 03 or 09 indicate the lengths of fragments.
Generate loop library in addition to fragment files. Run make_fragments.pl with -template option such as (five-letter code is 2ptl_ for example):
$> make_fragments.pl -template 2ptl_ 2ptl_.fasta
it requires 2ptl_.pdb and 2ptl_.zones to be present in your run dir and this pdb is a template pdb file which has been generated by createTemplate.pl described in README.loops. From the zone file, loops can be defined and a library of loop conformations for each defined loop are complied into a file called "2pt_.loops_all" (which usually contains 2000 loop conformations) based on fragment picking. Then the script "trimLoopLibrary.pl" is automatically called to reduce the size of the loop library and output the file as "2ptl_.loops". This file is later on used in the Rosetta loop modeling mode to build variable loops onto the template structure. A loop library differs from a fragment library mainly in that geometrical information is considered to pick "loop" fragments with desired length which can roughly close the gap based on the "take-off" stub positions. A newer version vall database (2006-05-05) has been provided in nnmake_database together with the orginal version 2001-02-02. You can make fragments using either version of database, just modifying make_fragments.pl to have it pointing to the version you want to use. Currently, making loop library only works with 2001-02-02 version as some newly developed loop modeling methods do not need a loop library any more. NOTES:
How to Make a vall Without Knowing What You are Doing.(Jack Schonbrun May 27, 2004) You need to use Rosetta executable to make a vall data.