ResidueTypeSet
– what it iscore::chemical::ResidueTypeSet
is responsible for iterating through the sets of residue types, including, but not limited to, amino
acids, nucleic acids, peptoid residues, and monosaccharides. It first reads through a file that contains the
location of residue types in the database. At the beginning of that file are the atom types, mm atom types,
element sets, and orbital types that will be used. The sets are all for fa_standard. If a new type of atom are
added for residues, this is where they would be added. Once it assigns the types, it then reads in extra residue
params that are passed through the command line. Finally, patches are applied to all residues added.
The class is being updated in 2015-2016, as described below.
aa_map_DO_NOT_USE
. By asking for everything with an AA (or name3) of the query type, these functions (now tagged with DO_NOT_USE) require instantiation of an exponentially large number of residue_types. We are trying to remove all of these in the code, at which point we can delete these functions.get_representative_type_with_variant_aa
and get_all_types_with_variants_aa
, where you supply the AA (e.g., aa_ala) and a list of variants that you want.ResidueType
(e.g., "ALA:NtermProteinFull"), just use name_map
.ResidueType
s generated at runtime (as opposed to during readin of the residue_types.txt in the database) can get tricky. The GlobalResidueTypeSets obtained from the ChemicalManager can't (and shouldn't) be modified. Instead, store the ResidueType in the Conformation of the Pose that's going to use it. You can use modifiable_residue_type_set_for_conf() and reset_residue_type_set_for_conf() to obtain and then replace the Conformation's ResidueTypeSet with a modify version.ResidueTypeSet
project.With the original ResidueTypeSet
, it was difficult to automatically read in & model interesting PDBs with ligands, modified RNA types, proteins with some atoms virtualized, etc. without having to explicitly specify patches or new residue types. For example, cryoEM-based high-accuracy structures of ribosomes could be read in due to the resolving of nucleotide modifications.
In principle, much of this chemical diversity could be saved in Rosetta's database and be instantiated in of its known universe upon startup. But the original ResidueTypeSet
system had a memory footprint and setup time that grows exponentially with the number of patches. This led to quite a few ugly (and imperfect) hacks in my lab where certain patches are only loaded in certain modes, etc., and a reluctance to introduce new ResidueType
s (which each get a large number of possible patch combinations). There are also .slim inventories of 'minimal' residue_types to prevent bloat on some low-memory systems like BlueGene.
This pull request refactored the ResidueTypeSet
system to load up residue_types and patches and only instantiate patched residue types as they are needed ('on the fly').
ResidueType
s when they are explicitly requested; otherwise store them as compact placeholders.extract_pdbs
(silent file extraction).Some more information available in these slides.
Subsequent pull requests are accomplishing the following tasks:
ResidueTypeSet
name_
and the standard information that applies to all ResidueTypes: AtomTypeSetCOP atom_types_
, ElementSetCOP elements_
, MMAtomTypeSetCOP mm_atom_types_
, orbitals::OrbitalTypeSetCOP orbital_types_
.ResidueType
s in ResidueTypeCOPs residue_types_
. But... each of these ResidueType
s starts out as a 'placeholder' object with just name
, name3
, and variants
. The rest of the object (atom names, atom graph, etc.) are instantiated when needed. The way to tell if a ResidueType
is a placeholder or instantiated is through the finalized()
function. (It should actually be possible/easy to deprecate this or make it a mutable cache of instantiated ResidueTypes, and we may want do so in the future, as it grows exponentially with number of patches.)aa_map_
, interchangeability_group_map_
, name3_map_
, aas_defined_
; but we should remove these when we get rid of all calls to DO_NOT_USE functions throughout the code!ResidueType
s by the ResidueTypeFinder, we now store lists of all the base_residue_types_
(ResidueTypes without any pathces) and the patches_
.on_the_fly
for whether or not the new placeholder/instantiation scheme is being used in this ResidueTypeSet. It is true
by default, unless you are using DNA adducts which are not yet supported (but could be; see below).There are still some things to do (Devs please add to this wishlist, and remove when done.):
replace_residue_type_in_set_defying_constness
, and obeyed inside make_sure_instantiated
.Orbitals
could be applied through a Patch
. See ResidueTypeSet.cc
for some comments on how to do this.Adduct
s could be applied as Patches. See ResidueTypeSet.cc
for some comments on how to do this.-chemical:on_the_fly false
to compare).ResidueTypeSelector
.base_residue_types
are all instantiated. May want to be smarter about this.
in the future may want to have 100,000 ligands in Rosetta, which could be kept as bare-bones
'placeholder' ResidueType
s with just name3
. These would be instantiated only when recognized
in say a PDB file or requested explicitly by name3
.ResidueType::variant_types()
and it comes out as a list of std::string
s, instead of a VariantType
enum. (Rocco was working before @JWLabonte refactor). will be easy to restore. Should happen after @JWLabonte finishes pull request #56