ProteinMPNNProbabilitiesMetric

Autogenerated Tag Syntax Documentation:

A metric for estimating the probability of an amino acid at a given position, as predicted by ProteinMPNN.

References and author information for the ProteinMPNNProbabilitiesMetric simple metric:

ProteinMPNNProbabilitiesMetric SimpleMetric's author(s): Moritz Ertelt, University of Leipzig [moritz.ertelt@gmail.com]

<ProteinMPNNProbabilitiesMetric name="(&string;)" custom_type="(&string;)"
        write_pssm="(&string;)" residue_selector="(&string;)"
        coord_selector="(&string;)" sequence_mask_selector="(&string;)" >
    <TiedPositions residue_selectors="(&string;)" />
</ProteinMPNNProbabilitiesMetric>

custom_type: Allows multiple configured SimpleMetrics of a single type to be called in a single RunSimpleMetrics and SimpleMetricFeatures. The custom_type name will be added to the data tag in the scorefile or features database.
write_pssm: Output filename for the psi-blast like position-specific-scoring-matrix to be used with the FavorSequenceProfile Mover
residue_selector: A residue selector specifying which residue or residues to predict on. If none selected, all are passed.
coord_selector: Name of a residue selector that selects per-residue coordinates to pass to ProteinMPNN. If none selected, all coordinates are passed.
sequence_mask_selector: Name of a residue selector that selects positions to be masked.

Subtag TiedPositions:

residue_selectors: Comma separated list of residue selectors to tie together. The first residues of each selector will be tied together, then the second, etc. Each residue selector must have the same number of residues.

General description

A metric for estimating the probability of an amino acid at a given position, as predicted by the ProteinMPNN model. This metric requires to be build with extras=torch, see Building Rosetta with TensorFlow and Torch for the compilation setup.

Note on processor usage.

By default, the ProteinMPNNProbabilitiesMetric will use multiple processors during prediction. (The number of processors to use is autodetermined by Torch, based on the number of processors on the machine.)

To limit the number of processors being used, set the following environment variables prior to running Rosetta (commands assuming Bash, and assuming one CPU used):

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export TORCH_NUM_THREADS=1
export TORCH_INTRAOP_NUM_THREADS=1
export TORCH_INTEROP_NUM_THREADS=1

This, of course, will increase the runtime, but may be necessary when running on systems where you explicitly need to control CPU usage.

Example

The example predicts the amino acid identities for chain A using only the coordinates of chain A, while masking the sequence of position 25 and uses the predicted probabilities to score the sequence.

<ROSETTASCRIPTS>
    <RESIDUE_SELECTORS>
        <Chain name="res" chains="A" />
        <Index name="mask" resnums="25"/>
    </RESIDUE_SELECTORS>
    <SIMPLE_METRICS>
        <ProteinMPNNProbabilitiesMetric name="prediction" residue_selector="res" coord_selector="res" sequence_mask_selector="mask" write_pssm="mpnn.pssm"/>
        <PseudoPerplexityMetric name="perplex" metric="prediction"/>
    </SIMPLE_METRICS>
    <FILTERS>
    </FILTERS>
    <MOVERS>
        <RunSimpleMetrics name="run" metrics="perplex"/>
    </MOVERS>
    <PROTOCOLS>
        <Add mover_name="run"/>
    </PROTOCOLS>
</ROSETTASCRIPTS>

Reference

@article{dauparas2022robust,
  title={Robust deep learning--based protein sequence design using ProteinMPNN},
  author={Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
  journal={Science},
  volume={378},
  number={6615},
  pages={49--56},
  year={2022},
  publisher={American Association for the Advancement of Science}
}

ProteinMPNNProbabilitiesMetric