Polymer division banner Polymer division home page Research areas link Research projects link Research facilities link Staff contact link Search link NIST link Polymer division home page Materials Science and Engineering Laboratory
Polymers Main Page > Research Highlights > Detail
   

Technical Highlights in Information/Knowledge Management

 

A Fully Automated Peak Picking and Integration Algorithm for Mass Spectral Data

 
A numerical algorithm is described that accurately locates and calculates the area beneath peaks from real mass spectral data using only reproducible mathematical operations and no user-selected parameters. Such a fully automated algorithm was required for rapid and repeatable processing of mass spectral data containing hundreds of peaks. By working without any user input it both saves operator time and eliminates operator bias. The first criterion is desirable when processing large amounts of data (for example in proteomics research). The second criterion is necessary to the Polymer Division's goal of creating an absolute molecular mass distribution synthetic polymer Standard Reference Material where operator bias in the data analysis cannot be tolerated.
 
William E. Wallace and Anthony J. Kearsley

 
A unified collection of algorithms has been developed that accurately locates peaks and calculates their area using only reproducible mathematical operations and no user-selected parameters. As shown in Figure 1, the method consists of three steps: 1) statistical characterization of the data set and an analyte-free background spectrum; 2) data set segmentation to determine "strategic points"; and 3) deflation of the number of strategic points guided by the statistical properties of the data sets. The final deflated set of strategic points consists of groups of three points that define the beginning, center, and end of each peak in the data. For closely spaced peaks the strategic point that defines the end of one peak may also define the beginning of the next. Finally, a polygonal fitting routine is used to calculate relative peak area.
 
The time-series segmentation algorithm at the heart of the method consists of two steps. The first portion (2a) requires the selection of the strategic points. These points are selected based on an iterative procedure that identifies points whose orthogonal distance from the end-point connecting line segment is greatest. Once a point with greatest orthogonal distance from the mean has been identified, it joins the collection of strategic points and, in turn, becomes an end-point for two new line segments from which a point with greatest orthogonal distance is again found. This numerical scheme is performed until the greatest orthogonal distance to any end-point connecting line segment drops beneath a prescribed threshold value. This threshold value is calculated from the statistical properties of the data set. The selection of these points does not require equally spaced data. The second phase of the algorithm (2b) requires the solution of an optimization problem, specifically, locating strategic point heights (that is, adjusting strategic y-axis values associated strategic x-axis values) that minimize the sum of orthogonal distance from raw data. This problem is a nonlinear (and non-quadratic) optimization problem that can be accomplished quickly using a modern nonlinear programming algorithm. Parts 2a and 2b are collectively called the Kearsley-Wallace method, which is an extension of the earlier Douglas-Peucker method.
 
Figure 1 Method Flow Chart
Figure 1 Method Flow Chart
 
Consider the polystyrene matrix-assisted laser desorption ionization time-of-flight mass spectrum shown in Figure 2 (black) and its complementary matrix-only background spectrum (red). The resultant strategic points (green) defining peak beginning, center, and end, and the relative peak areas (blue) are also shown. Note that ion intensity is on a logarithmic scale, thus the small peaks are significantly smaller than the main series of peaks. The analysis of this was done without operator intervention of any sort. The only input provided was the spectrum to be analyzed and an analyte free spectrum to determine inherent instrument noise. The noise has both chemical (e.g., improperly time focused ions) and electronic (e.g., detector dark current) components. These noise elements span a wide frequency range and cannot simply be smoothed out of the data without distorting peak shape (and; therefore, peak area). Our experience shows that the power spectrum of the noise cannot be predicted solely from the experimental conditions; therefore, blind application of smoothing and/or filtering algorithms will unintentionally remove information from the data.
 
Figure 2. Sample polystyrene MALDI TOF mass spectrum (black) and it complementary matrix -only background spectrum (red).
Figure 2. Sample polystyrene MALDI TOF mass spectrum (black) and it complementary matrix -only background spectrum (red).
 
Some of the additional strengths of this method include the fact that it requires no knowledge of peak shape and; furthermore, it requires no preprocessing of the data, i.e., smoothing or baseline correction with their resultant distortion of peak area. Lastly, the method does not require equal spacing of data points (e.g., time-of-flight data can be processed in mass-space where the points have a square root spacing). The one significant weakness is that the method is more successful and efficient if a blank (analyte-free) spectrum is used to calibrate instrument background noise. (However, such a background spectrum is not strictly required.)
 
Future plans include the creation of a publicly accessible, secure-Web-server application for online, real-time application of the algorithm. We will also relay the method to other standards-setting organizations for comment and to commercial software vendors for implementation in their products. Lastly, we have begun to tackle the much more subtle problem of automated, operator independent baseline compensation.
 
 

For More Information on this Topic


W.E. Wallace, and C.M. Guttman (Polymers Division, NIST); A.J. Kearsley and J. Bernal (Mathematical and Computational Sciences Division, NIST)
 
 
 
 
 
 
 
NIST Material Science & Engineering Laboratory - Polymers Division