The RosettaCarbohydrate Framework was created by Dr. Jason W Labonte (JWLabonte@jhu.edu), in collaboration with Dr. Jared Adolf-Bryfogle (jadolfbr@gmail.com)
PIs are: Dr. Jeff Gray of JHU (jgray@jhu.edu) and Dr. William Schief of Scripps (schief@scripps.edu).
Residue‐centric modeling and design of saccharide and glycoconjugate structures Jason W. Labonte Jared Adolf‐Bryfogle William R. Schief Jeffrey J. Gray Journal of Computational Chemistry, 11/30/2016 https://doi.org/10.1002/jcc.24679
Growing Glycans in Rosetta: Accurate de novo glycan modeling, density fitting, and rational sequon design Jared Adolf-Bryfogle, J. W Labonte, J. C Kraft, M. Shapavolov, S. Raemisch, T. Lutteke, F. Dimaio, C. D Bahl, J. Pallesen, N. P King, J. J Gray, D. W Kulp, W. R Schief bioRxiv 2021.09.27.462000; https://doi.org/10.1101/2021.09.27.462000
Automatically Fixing Errors in Glycoprotein Structures with Rosetta Brandon Frenz, Sebastian Rämisch, Andrew J. Borst, Alexandra C. Walls Jared Adolf-Bryfogle, William R. Schief, David Veesler, Frank DiMaio Structure, 1/2/2019 https://www.sciencedirect.com/science/article/pii/S0969212618303320
All Rosetta runs with carbohydrate-containing structures should use an option to make Rosetta carbohydrate-aware. An error will be thrown if this is not present. This option is also needed if you plan on glycosylating structures.
-include_sugars
PDBs from the RCSB should be able to be read in by default. However, in order to load a PDB file, one must have LINK records present. Rosetta will build the glycans out using internal names and create the glycans based on connectivity.
Reading in most PDB files will require an option to map the non-specific HETNAM IDs to chemically accurate identifiers:
-alternate_3_letter_codes pdb_sugar
In order to round-trip PDB glycans, use the option to write PDB codes instead of Rosetta:
-write_glycan_pdb_codes
When loading a file from the PDB, the order of HETATM and LINK records is important for reading it into Rosetta. Since pdb files are usually not formatted for Rosetta-compatibility, connections can be determined internally, ignoring the order of records. Instead atom distances are used to determine protein-sugar and sugar-sugar connections.
-auto_detect_glycan_connections
-maintain_links
the maximum and minimum bond lengths for a conection to be found are at a default of 1.15 and 1.65 A. Since many structures are chemically incorrect, these parameters can be changed to detect unphysical bonds, too:
-min_bond_length < Real >
-max_bond_length < Real >
-out::level 999
Maintain links option should generally be used for loading pdbs. If you are having issues, try turning it off.
You may also need to add the option, -load_PDB_components false
.
In order to load GLYCAM structures, one can pass the option -glycam_pdb_format
in order to load in this type of file.
In order to write out structures correctly pdb link records must be output. This option is now the default.
-write_pdb_link_records
score.default.macosclangrelease \
-include_sugars \
-alternate_3_letter_codes pdb_sugar \
-load_PDB_components false \
-auto_detect_glycan_connections \
-min_bond_length 1.1 \
-max_bond_length 1.7 \
-ignore_zero_occupancy false \
-ignore_unrecognized_res \
-out:output \
-s 5t3x.pdb
It is recommended to use the Rosetta Common Configurations to easily work with glycan structures in Rosetta. More info can be found here.
Most of the time we deal with glycans, we use IUPAC names. The glycan 'root' as referred to in Rosetta, is the residue that the glycan is attached to protein. Some components, such as the GlycanResidueSelector, use 'glycan positions' to easily specify residues of glycans. These numbers go from 1 -> N, where 1 is the first glycan residue and N is the last residue. In order to find out the glycan position of the residue you are interested in, use the GlycanInfo application.
GlycanTreeModeler - Full protocol for modeling Carbohydrates. This is preferred.
SmallMover - Make small changes to all of the torsion angles in a random glycosidic bond.
ShearMover - Make a shearing motion, by making opposite small changes to a pair of near-parallel glycosidic torsions.
RingConformationMover - Make a change to a cyclic residue's ring conformation. (Note that this is not normally an energetically favorable thing to do!)
GlycanInfo - Get information on all glycan trees within a pose
GlycanClashCheck - Obtain data on model clashes with and between glycans, or between glycans and other protein chains.
GlycanDock - Dock free glycans onto protein surfaces
GlycanTreeModeler - Model glycan trees using known carbohydrate information. Works for full denovo modeling or refinement.
SimpleGlycosylateMover - Glycosylate poses with glycan trees.
GlycanResidueSelector - Select specific residues of each glycan tree of interest.
Although an app is planned, one can use the CreateGlycanSequonMover
in order to design the needed residues around a potential glycosylation site. This effectively creates the Asn-X-Ser/Thr sequence motif. Options are available to design the X, design around, or alternatively use other sequence motifs. Please use rosetta_scripts.xxxrelease -info CreateGlycanSequonMover for more information and options. A base script is shown below that uses the -parser:script_vars
option to scan a protein for optimal glycosylation sites at the residues given and then glycosylate and model the carbohydrate. It is recommended to create at least 100-1000 models of the carbohydrate at each position.
<ROSETTASCRIPTS>
<RESIDUE_SELECTORS>
<Index name="select" resnums="%%glycan_position%%" />
</RESIDUE_SELECTORS>
<MOVERS>
<CreateGlycanSequonMover name="create_motif" residue_selector="select" basic_enhanced_n_sequon="false" design_x_positions="false" pack_neighbors="1"/>
<SimpleGlycosylateMover name="glycosylate" residue_selector="select" glycosylation="%%glycosylation%%" strip_existing="1" />
<GlycanRelaxMover name="basic_relax" />
<GlycanTreeRelax name="tree_relax" quench_mode="false" rounds="1" layer_size="2" window_size="1"/>
</MOVERS>
<PROTOCOLS>
<Add mover_name="create_motif" />
<Add mover_name="glycosylate" />
<Add mover_name="tree_relax" />
</PROTOCOLS>
</ROSETTASCRIPTS>
Structures can be glycosylated either through a function accessible to PyRosetta or via RosettaScripts as covered below.
See the SimpleGlycosylateMover documentation
Here is an example of adding a man9 to the pose. This can now be done in two ways within PyRosetta, either via the core function, or the class wrapper.
The following uses a function to glycosylate a pose using the IUPAC name:
/// @brief Glycosylate the Pose at the given sequence position using an IUPAC sequence.
void glycosylate_pose(
Pose & pose,
uint const sequence_position,
std::string const & iupac_sequence,
bool const idealize_linkages = true );
Here is an example of using the function to glycosylate the pose using a man5 glycan, a commonly found glycan in biology.
from rosetta import *
from rosetta.core.pose.carbohydrates import glycosylate_pose
rosetta.init('-include_sugars -write_pdb_link_records')
p = Pose("my_pose.pdb")
glycosylate_pose(p, 10, "a-D-Manp-(1->3)-[a-D-Manp-(1->3)-[a-D-Manp-(1->6)]-a-D-Manp-(1->6)]-b-D-Manp-(1->4)-b-D-GlcpNAc-(1->4)-b-D-GlcpNAc-", True)
print p
print p.residue(3)
print p.chain_sequence()
This mover is accessible both in PyRosetta and RosettaScripts. It was written by Jared Adolf-Bryfogle.
see SimpleGlycosylateMover for a full description.
Example using a man5:
from rosetta import *
from rosetta.protocols.carbohydrates import SimpleGlycosylateMover
rosetta.init('-include_sugars -write_pdb_link_records')
p = Pose("my_pose.pdb")
glycosylator = SimpleGlycosylationMover()
glycosylator.set_glycosylation('man5')
glycosylator.set_positions(10)
glycosylator.apply(p)
print p
print p.residue(3)
print p.chain_sequence()
Glycans can be built by themselves (IE NOT attached to a protein) using PyRosetta. There is currently no way to do this in RosettaScripts: Glycans are creating using their IUPAC names.
To properly build an oligosaccharide, Rosetta must know the following details about each sugar residue being created in the following order:
->2)
), →4) (->4)
), →6) (->6)
), ↔1) (<->1)
), etc.; default value is ->4)-
a
or alpha
) or β (b
or beta
); default value is alpha
l
(L) or d
(D); default value is D-(1->4)-
for “-(1→4)-”. Rosetta will assume -(1->
for aldoses and -(2->
for ketoses. Note that the standard is to write the IUPAC sequence of a saccharide chain in reverse order from how they are numbered. Bidirectional linkages use a double-headed arrow, e.g., -(1<->1)-
for “-(1↔1)-”.The following example creates a pose from the IUPAC saccharide name:
from rosetta import *
from rosetta.core.pose import pose_from_saccharide_sequence
rosetta.init('-include_sugars -write_pdb_link_records')
galactose = pose_from_saccharide_sequence('Galp')
maltotriose = pose_from_saccharide_sequence('a-D-Glcp-' * 3)
mannose = pose_from_saccharide_sequence('->3)-a-D-Manp')
lactose = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-Glcp')
print lactose
print lactose.residue(1)
print lactose.chain_sequence()