Rosetta
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Pages
maxsub.cc
Go to the documentation of this file.
1 // -*- mode:c++;tab-width:2;indent-tabs-mode:t;show-trailing-whitespace:t;rm-trailing-spaces:t -*-
2 // vi: set ts=2 noet:
3 //
4 // (c) Copyright Rosetta Commons Member Institutions.
5 // (c) This file is part of the Rosetta software suite and is made available under license.
6 // (c) The Rosetta software is developed by the contributing members of the Rosetta Commons.
7 // (c) For more information, see http://www.rosettacommons.org. Questions about this can be
8 // (c) addressed to University of Washington UW TechTransfer, email: license@u.washington.edu.
9 
10 /// @file numeric/model_quality/maxsub.cc
11 /// @brief ab-initio fragment assembly protocol for proteins
12 /// @details Routines for calculating maxsub-based structural quality scores. Based on code originally
13 /// written by Charlie Strauss for rosetta++, ported over by James Thompson.
14 ///
15 /// @author James Thompson
16 
20 
21 // ObjexxFCL Headers
22 #include <ObjexxFCL/FArray1.hh> // for FArray1
23 #include <ObjexxFCL/FArray1D.fwd.hh> // for FArray1D_double, FArray1...
24 #include <ObjexxFCL/FArray1D.hh> // for FArray1D
25 #include <ObjexxFCL/FArray2.hh> // for FArray2
26 #include <ObjexxFCL/Star.hh> // for star
27 #include <numeric/numeric.functions.hh> // for square
28 #include <ObjexxFCL/FArray1A.hh>
29 #include <ObjexxFCL/FArray2A.hh>
30 //#include <ObjexxFCL/format.hh>
31 
32 // C++ Headers
33 #include <cmath>
34 #include <cstdlib>
35 
36 namespace numeric {
37 namespace model_quality {
38 
39 ////////////////////////////////////////////////////////////////////////////////
40 ///
41 /// @brief
42 ///
43 /// @details
44 /// cems 2001.
45 /// this is the main rosetta entry point for this function.
46 /// it is a wrapper for max sub, converting the input arrays to
47 /// double precision and reducing them
48 /// to just calphas
49 /// it does max sub comparing the claphas passed into thosw of the
50 /// native and returns
51 /// the number of aligned residues, the rms of these and the log eval c
52 /// of the comparison
53 ///
54 /// @param x - [in/out]? -
55 /// @param nali - [in/out]? -
56 /// @param rms - [in/out]? -
57 /// @param logeval - [in/out]? -
58 ///
59 /// @global_read
60 ///
61 /// @global_write
62 ///
63 /// @remarks
64 ///
65 /// @references
66 ///
67 /// @author
68 ///
69 /////////////////////////////////////////////////////////////////////////////////
70 // void
71 // maxsub_native(
72 // FArray3_float const & x,
73 // int & nali,
74 // float & rms,
75 // float & logeval
76 // )
77 // {
78 // using namespace misc;
79 // using namespace native;
80 //
81 // // local
82 // FArray2D_double xp( 3, total_residue );
83 // FArray2D_double xe( 3, total_residue );
84 // double mxrms,mxpsi,mxzscore,mxscore,mxeval;
85 //
86 // if ( !get_native_exists() || ( files_paths::multi_chain && !design::dna_interface ) ) {
87 // rms = 0.0;
88 // logeval = 0.0;
89 // nali = 0;
90 // return;
91 // }
92 //
93 // int n_points = 0;
94 // for ( int i = 1; i <= total_residue; ++i ) {
95 // if ( native_occupancy( 2, i ) <= 0.0 ) continue;
96 // n_points++;
97 // for ( int k = 1; k <= 3; ++k ) {
98 // xe(k,n_points) = native_ca(k,i);
99 // xp(k,n_points) = x(k,2,i); // calphas
100 // }
101 // }
102 // maxsub(n_points,xe,xp,mxrms,mxpsi,nali,mxzscore,mxeval,mxscore);
103 //
104 // rms = mxrms; // double to float conversion
105 // logeval = std::log(mxeval);
106 //
107 // }
108 
109 ////////////////////////////////////////////////////////////////////////////////
110 ///
111 /// @brief identify the largest subset of CA atoms of a model that superimposes
112 /// "well" (under certain rms cutoff) over the experimental structure
113 ///
114 /// @details
115 /// this function was adapted and improved by cem strauss from an
116 /// original template provided by angel ortiz.
117 ///
118 /// Here applies a modification of Dani Fischer's heuristic algorithm
119 /// for finding the largest subset of residues for superimposition within
120 /// a threshold. The part that restraints the secondary structure
121 /// matching needs to be changed.
122 ///
123 /// At this point, the algorithm works as follows: first, the residue
124 /// assignment is done on the basis of the global secondary structure
125 /// similarity. Then, with this assignment for residue pairs, the
126 /// heuristic procedure of Fisher is used.
127 ///
128 /// A seed segment of 7 CA atoms was first superimposed onto the reference structure,
129 /// then the neighbor radius is gradually increased (up to "distance_tolerance"):to
130 /// include more CA atoms within the
131 /// range "t" to do a new superimposition, if the aligned rms is below the rms cutoff "rmstol"
132 /// the new alignment is accepted. This proceduare is repeated for every 7-residue seed segment
133 /// along the sequence until the largest subset of atoms which can yield an aligned rms below is
134 /// found."nsup" is the total number of CA atoms in the model, xe and xp are coordinates of native
135 /// and model respectively. "rms" is the final aligned rms based on the identified subset of atoms,
136 /// "nali" is the number of aligned atoms in the subset and "psi" is the ratio of nali/nsup; The rest
137 /// scores are metrics for quality of the alignment. Note that if no suitable alignment can be found
138 /// given the input rms cutoff, the rms of overall structure alignment is returned and "nali" is set to
139 /// zero. -- chu 2009/10
140 ///
141 /// @param nsup - [in/out]? -
142 /// @param xe - [in/out]? -
143 /// @param xp - [in/out]? -
144 /// @param rms - [in/out]? -
145 /// @param psi - [in/out]? -
146 /// @param nali - [in/out]? -
147 /// @param zscore - [in/out]? -
148 /// @param evalue - [in/out]? -
149 /// @param score - [in/out]? -
150 ///
151 /// @global_read
152 ///
153 /// @global_write
154 ///
155 /// @remarks
156 ///
157 /// @references
158 ///
159 /// @author
160 ///
161 /////////////////////////////////////////////////////////////////////////////////
162 void
164  int & nsup, // total number of residues to superimpose
167  double & rms,
168  double & psi,
169  int & nali, // number of aligned residues
170  double & zscore,
171  double & evalue,
172  double & score,
173  double rmstol, //default = 4.0
174  double distance_tolerance //default = 7.0
175 )
176 {
177  using namespace ObjexxFCL;
178 
179  xe.dimension( 3*nsup );
180  xp.dimension( 3*nsup );
181 
182 
183  // --- Vector alignment
184 
185  //int const maxres = { 3000 };
186  //int const maxlen = { 2 * maxres };
187  //int const maxfrag = { 100 };
188  //double const angmax = { 60.0 };
189 
190  if ( distance_tolerance == -1.0 ) {
191  distance_tolerance = ( rmstol * 7.0 ) / 4.0; // first guess, extrapolating from 4 -> 7
192  }
193 
194  double distance_increment = distance_tolerance / 7.0; // maintain same number of cycles
195 
196  double znew;
197  double am,as;
198 
199  FArray1D_double xp0( 3*nsup );
200  FArray1D_double xe0( 3*nsup );
201  FArray1D_double wmax( nsup );
202 
203  FArray1D_bool logical_w( nsup );
204 
205  numeric::model_quality::RmsData* rmsdata = RmsData::instance(); // get a pointer to the singleton class
206 
207  //------------------------------------------------------------------------------
208  // First selects the residues allowed to be superimposed
209  //------------------------------------------------------------------------------
210  // need to pass in xp and xe, nsup = total_res
211 
212  int nr = nsup;
213  int l = 7; // length of peptide for Fisher's maxsub algorithm.
214 
215  for ( int i = 1; i <= nsup*3; ++i ) {
216  xe0(i) = xe(i);
217  xp0(i) = xp(i);
218  }
219  //------------------------------------------------------------------------------
220  // Now apply Fishers's maxsub algorithm. An heptapeptide is used.
221  // The algorithm is modified in that only pairs of residues with
222  // similar local secondary structures are allowed to be used as
223  // seed residues to superimpose.
224  //------------------------------------------------------------------------------
225  int smax = 0;
226  for ( int i = 1; i <= nsup; ++i ) { // should this not be <=3*nsup?
227  //w(i) = 0.0;
228  wmax(i) = 0.0;
229  }
230  double rmsmax = 1000.0;
231 
232 
233  for ( int i = 1; i <= nsup-l+1; ++i ) {
234 
235  rmsdata->clear_rms();
236  // if ( matched(i)} ) { // this line is Angel's variation on Danni Fischer's method.
237  // only seed at points with matched SS.
238 
239  // build up a seed segment of length l
240  int lmax = 0;
241  for ( int j = 1; j <= nsup; ++j ) {
242  if ( j >= i && j <= i+l-1 ) { // could do this without if statement
243  rmsdata->add_rms(j,xp0,xe0);
244  logical_w(j) = true; // w(j) = 1.0
245  ++lmax;
246  } else {
247  logical_w(j) = false; // w(j) = 0.0
248  }
249  }
250 
251  // find initial alignment using seed
252  // rmsfitca3 rotates all of the residues but the rotation only aligns the
253  // residues pushed into add_rms.
254  rmsfitca3(nsup,xp0,xp,xe0,xe,rms);
255  // chu 2009/10
256  // accept the inital seed alignment in case the first distance incremental
257  // step fails to find a better alignment
258  if ( (lmax > smax) && (rms <= rmstol) ) {
259  smax = lmax;
260  for ( int n = 1; n <= nsup; ++n ) {
261  if ( logical_w(n) ) {
262  wmax(n) = 1.0;
263  } else {
264  wmax(n) = 0.0;
265  }
266  // wmax(n) = w(n);
267  }
268  rmsmax = rms;
269  } else if ( (lmax == smax) && (rms < rmsmax) ) {
270  smax = lmax;
271  for ( int n = 1; n <= nsup; ++n ) {
272  if ( logical_w(n) ) {
273  wmax(n) = 1.0;
274  } else {
275  wmax(n) = 0.0;
276  }
277  //wmax(n) = w(n);
278  }
279  rmsmax = rms;
280  }
281  // next we iterate the following algorithm
282  // 1) using current alignment find all atoms within a threshold t of being superimposed
283  // 2) add these close ones to set to be aligned
284  // 3) aling using the current set, then reorient all atoms.
285  // 4) increment t by a small amount.
286  // 5) repeat this until theshold = 7 angstroms.
287  double t = 0.0;
288  int const last = lmax;
289  double min_d2 = square ( distance_tolerance + 1.0 );
290  // std::cout << min_d2 << std::endl;
291  while ( t < distance_tolerance ) { // for ( m = 1; m <= 7; ++m ) {
292  t += distance_increment;
293  // std::cout << "maxsub: t = "<< t << " distance_tolerance = " << distance_tolerance << "\n";
294  // increment threshold by one angstrom. (note this is ties to int(min_d))
295  double t2 = t*t; // t squared
296 
297 
298  // t = float(m); // *7.0/float(7); // huh? must be a relic???
299  for ( int n = 1; n <= nsup; ++n ) {
300  if ( !logical_w(n) ) { // if ( w(n) == 0.0} ) {
301  int k = 3*(n-1);
302  double const xpek1 = xp(k+1) - xe(k+1);
303  double const xpek2 = xp(k+2) - xe(k+2);
304  double const xpek3 = xp(k+3) - xe(k+3);
305  double const d2 =
306  ( xpek1 * xpek1 ) + ( xpek2 * xpek2 ) + ( xpek3 * xpek3 );
307  // squared distancne
308  if ( d2 <= t2 ) { // is this atom within threshold?
309  rmsdata->add_rms(n,xp0,xe0); // if so, add to list
310  logical_w(n) = true; //w(n) = 1.0 // set membership flag
311  ++lmax; // keep a count of members
312  } else {
313  // if not below threshold, then find the closest atom
314  if ( d2 <= min_d2 ) min_d2 = d2; //min_d = min(d,min_d)
315  }
316  }
317  }
318  // check if we added any residues on this iteration.
319  // if not then 1) we dont need to refit the calphas 2) we can advance threshold level
320  if ( lmax != last ) {
321  rmsfitca3(nsup,xp0,xp,xe0,xe,rms);
322  } else {
323  t = std::sqrt( min_d2 ); //advance the threshold
324  // t = static_cast< int >(std::sqrt(min_d2)); // advance the threshold
325 
326  // std::cout << i << " skipping " << t << ' ' <<
327  // static_cast< int >(min_d2) << ' ' << lmax << ' ' << min_d2 << " new t = " << t<< std::endl;
328 
329  }
330 
331 
332  // huh? logic here is confusing.
333  // chu 2009/10
334  // after each distance incremental step to include more atoms into the superimpostion,
335  // check whether the new rms is still under the rmstol cutoff.
336  // If more atoms can be aligned under the rmstol cutoff, accept this new alignment;
337  // If in a different set (of the same size) of atoms can be aligned yielding lower rms, aceept this new alignment
338  // Otherwise, ignore this new alignment and keep searching until done.
339  if ( (lmax > smax) && (rms <= rmstol) ) {
340  smax = lmax;
341  for ( int n = 1; n <= nsup; ++n ) {
342  if ( logical_w(n) ) {
343  wmax(n) = 1.0;
344  } else {
345  wmax(n) = 0.0;
346  }
347  // wmax(n) = w(n);
348  }
349  rmsmax = rms;
350  } else if ( (lmax == smax) && (rms < rmsmax) ) {
351  smax = lmax;
352  for ( int n = 1; n <= nsup; ++n ) {
353  if ( logical_w(n) ) {
354  wmax(n) = 1.0;
355  } else {
356  wmax(n) = 0.0;
357  }
358  //wmax(n) = w(n);
359  }
360  rmsmax = rms;
361  }
362  } // while( t< distance_tolerance) -- chu 2009/10 bugfix
363  }
364 
365  //------------------------------------------------------------------------------
366  // --- Confirm final superimposition.
367  // --- first, compile regions without indels. Report rms
368  //------------------------------------------------------------------------------
369  // std::cout << "maxsub: next step " << std::endl;
370  if ( smax > 1 ) {
371  rmsfitca2(nr,xp,xe,wmax,smax,rms);
372  // side effect sets xpc,zpc etc... via common
373  } else {
374  // if smax is less than 2 then basically we failed to find an alignement
375  // to make the best of a bad situation we simply revert to aligning all of the residue
376  for ( int i = 1; i <= nr; ++i ) {
377  wmax(i) = 1.0;
378  }
379  rmsfitca2(nr,xp,xe,wmax,nr,rms);
380  // side effect sets xpc,zpc etc... via common
381  }
382 
383  // std::cout << std::endl << "RMS = " << F( 7, 3, rms ) <<
384  // " SMAX = " << I( 5, smax ) << std::endl << std::endl;
385  psi = ( static_cast< double >( smax ) / nsup ) * 100.0;
386 
387  //------------------------------------------------------------------------------
388  // --- Transform PSI to Levitt & Gerstein Sstr
389  // --- NEED TO BE REMOVED
390  //------------------------------------------------------------------------------
391 
392  // compute score without gaps
393 
394  score = 0.0;
395  nali = 0;
396 
397  for ( int i = 1; i <= nsup; ++i ) {
398  if ( wmax(i) == 1.0 ) {
399  double d = 0.0;
400  int k = 3*(i-1);
401  for ( int j = 1; j <= 3; ++j ) {
402  double const xpekj = xp(k+j) - xe(k+j);
403  d += xpekj * xpekj;
404  }
405  d = std::sqrt(d) / rmstol;
406  score += 1.0 / ( 1.0 + ( d * d ) );
407  ++nali;
408  }
409  }
410  //chu 2009/10 if no alignment is found, return nali as zero
411  if ( smax <=1 ) nali=smax;
412  //------------------------------------------------------------------------------
413  // --- All done. Report match statistics
414  //------------------------------------------------------------------------------
415 
416  //------------------------------------------------------------------------------
417  // --- Report probabilities for random matches.
418  // --- These are preliminary values for the average and standard deviation
419  // --- of PSI as a function of NORM. These values are:
420  // --- m(L) = 759.31 * L**(-0.7545)
421  // --- s(L) = 393.32 * L**(-0.9009)
422  //------------------------------------------------------------------------------
423 
424  // am = 759.31 * std::pow( norm, -0.7545 );
425  // as = 393.32 * std::pow( norm, -0.9009 );
426  // am = 695.41 * std::pow( norm, -0.7278 );
427  // as = 340.00 * std::pow( norm, -0.9045 );
428  // EV fitting, using N > 70
429  am = 747.29 * std::pow( static_cast< double >( nr ), -0.7971 );
430  as = 124.99 * std::pow( static_cast< double >( nr ), -0.6882 );
431  zscore = (psi-am)/as;
432  // this is the gaussian approach. Actually, extreme-value is more adequate
433  evalue = 0.5 * erfcc(zscore/std::sqrt(2.0));
434  // here it is the EV approach
435  znew = 0.730*((1.2825755*zscore)+0.5772);
436  evalue = 1.0-std::exp(-std::exp(-znew));
437  // due to numerical errors, the e-value is cutoff at 2.650E-14
438  if ( evalue < 2.650E-14 ) evalue = 2.650E-14;
439 
440 }
441 
442 //////////////////////////////////////////////////////////////////////////////
443 ///
444 /// @brief
445 ///
446 /// @details
447 ///
448 /// @param x - [in/out]? -
449 ///
450 /// @return
451 ///
452 /// @global_read
453 ///
454 /// @global_write
455 ///
456 /// @remarks
457 ///
458 /// @references
459 /// (C) Copr. 1986-92 Numerical Recipes Software
460 ///
461 /// @author
462 ///
463 ////////////////////////////////////////////////////////////////////////////////
464 double
465 erfcc( double x )
466 {
467 
468  double t,z;
469  z = std::abs(x);
470  t = 1.0/(1.0+0.5*z);
471  double erfcc = t*std::exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
472  t*(.09678418+t*(-.18628806+t*(.27886807+t*(-1.13520398+t*(1.48851587+
473  t*(-.82215223+t*.17087277)))))))));
474  if ( x < 0.0 ) erfcc = 2.0 - erfcc;
475  return erfcc;
476 }
477 
478 ////////////////////////////////////////////////////////////////////////////////
479 ///
480 /// @brief
481 // Calculate the center of geometry for the selected atoms ---
482 ///
483 /// @details
484 ///
485 /// @param C - [in/out]? -
486 /// @param WT - [in/out]? -
487 /// @param NAT - [in/out]? -
488 /// @param XC - [in/out]? -
489 /// @param YC - [in/out]? -
490 /// @param ZC - [in/out]? -
491 ///
492 /// @global_read
493 ///
494 /// @global_write
495 ///
496 /// @remarks
497 ///
498 /// @references
499 ///
500 /// @author
501 ///
502 /////////////////////////////////////////////////////////////////////////////////
503 void
505  ObjexxFCL::FArray1A_double C, // coordinates for each atom (must be of length NAT)
506  ObjexxFCL::FArray1A_double WT, // weights on each atom (must be of length NAT)
507  int NAT, // number of atoms provided
508  double & XC, // x-coordinate for center of mass
509  double & YC, // y-coordinate for center of mass
510  double & ZC // z-coordinate for center of mass
511 )
512 {
513  using namespace ObjexxFCL;
514 
515  C.dimension( star );
516  WT.dimension( star );
517 
518  // local
519  static double const ZERO = { 0.0 };
520 
521  double SUMX = ZERO;
522  double SUMY = ZERO;
523  double SUMZ = ZERO;
524  double SUM = ZERO;
525  int i3 = 0;
526 
527  for ( int i = 1; i <= NAT; ++i ) {
528  double const WT_i = WT(i);
529 
530  SUMX += C(i3+1) * WT_i;
531  SUMY += C(i3+2) * WT_i;
532  SUMZ += C(i3+3) * WT_i;
533  SUM += WT_i;
534  i3 += 3;
535  }
536 
537  SUM = 1.0/SUM;
538 
539  XC = SUMX*SUM;
540 
541  YC = SUMY*SUM;
542 
543  ZC = SUMZ*SUM;
544 
545 
546  i3 = 0;
547 
548  for ( int i = 1; i <= NAT; ++i ) {
549  C(i3+1) -= XC;
550  C(i3+2) -= YC;
551  C(i3+3) -= ZC;
552  i3 += 3;
553  }
554 
555  // std::cout << "CENTER OF MASS:" << space( 19 ) <<
556  // F( 10, 4, XC ) << F( 10, 4, YC ) << F( 10, 4, ZC ) << std::endl;
557 } // COMAS
558 
559 } // namespace model_quality
560 } // namespace numeric
RMS functions imported from rosetta++.
FArray1A & dimension(IR const &I_a)
Dimension by IndexRanges.
Definition: FArray1A.hh:680
def x
FArray1A: Fortran-Compatible 1D Argument Array.
Definition: FArray1.hh:32
std::string E(int const w, int const d, float const &t)
Exponential Format: float.
Definition: format.cc:312
Star const star
Definition: Star.cc:21
#define C(a, b)
Definition: functions.cc:27
def z
T square(T const &x)
square( x ) == x^2
static RmsData * instance()
Definition: RmsData.cc:26
T abs(T const &x)
std::abs( x ) == | x |
Definition: Fmath.hh:295
RmsData is a class intended to replace the global rms_obj namespace from rosetta++. Initial implementation is with a singleton design pattern to mimic a global namespace from rosetta++.
Definition: RmsData.hh:31
tuple rms
Definition: loops_kic.py:130
void rmsfitca2(int npoints, ObjexxFCL::FArray2A< double > xx, ObjexxFCL::FArray2A< double > yy, ObjexxFCL::FArray1A< double > ww, int natsel, double &esq)
computes the rms between two weighted point vectors.
Definition: rms.cc:627
DimensionExpressionPow pow(Dimension const &dim1, Dimension const &dim2)
pow( Dimension, Dimension )
double erfcc(double x)
Definition: maxsub.cc:465
Numeric functions.
void maxsub(int &nsup, ObjexxFCL::FArray1A_double xe, ObjexxFCL::FArray1A_double xp, double &rms, double &psi, int &nali, double &zscore, double &evalue, double &score, double rmstol, double distance_tolerance)
identify the largest subset of CA atoms of a model that superimposes "well" (under certain rms cutoff...
Definition: maxsub.cc:163
void rmsfitca3(int npoints, ObjexxFCL::FArray2A< double > xx0, ObjexxFCL::FArray2A< double > xx, ObjexxFCL::FArray2A< double > yy0, ObjexxFCL::FArray2A< double > yy, double &esq)
Definition: rms.cc:784
void COMAS(ObjexxFCL::FArray1A_double C, ObjexxFCL::FArray1A_double WT, int NAT, double &XC, double &YC, double &ZC)
Definition: maxsub.cc:504