[Area vs height] [Historical methods] [perpendicular drop] [Spreadsheets] [Matlab/Octave codes] [Automatic peak
detection] [Baseline
correction] [Broadening and
asymmetry] [Interactive
tools]

The symbolic integration of functions and the calculation of
definite integrals are topics that are introduced in elementary
Calculus courses. The numerical integration of digitized signals
finds application in analytical signal processing mainly as a method
for measuring the areas under the curves of peak-type signals. Peak area measurements are very important in chromatography, a class of chemical measurement techniques in which a mixture of components is made to flow through a chemically-prepared tube or layer that allows some of the components in the mixture to travel faster than others, followed by a device called a

Peak area measurements are occasionally used also in spectroscopy, for example in flow injection analysis and in graphite furnace atomic absorption (reference 87). In that application, calibration curves based on area measurements are more linear than peak height measurements because most of the area of a peak is measured when the transient absorbance is less than maximum and where Beer's Law is more strictly obeyed.

On the other hand, peak height measurements are simpler to make and are less prone to interference by neighboring, overlapping peaks. And a further disadvantage of peak area measurement is that the peak start and stop points must be determined, which may be difficult especially if the multiple peaks overlap. In principle, curve fitting can measure the areas of peaks even then they overlap, but that requires that the shapes of the peaks be known at least approximately (however, see PeakShapeAnalyticalCurve.m described in the Appendix).

Chromatographic peaks are often described as a Gaussian function or as a

Before computers, there were several methods used to compute peak areas that sound strange by today's standards:

(But now that computing power is built into or connected to every measuring instrument, more accurate and convenient digital methods can be employed. However it is measured, thea) plot the signal on a paper chart, cut out the peak with scissors, then weigh the cut out piece on a micro-balance compared to a square section of known area;

(b) count the grid squares under a curve recorded on gridded graph paper,

(c) use a mechanical ball-and-disk integrator,

(d) use geometry to compute the area under a triangle constructed with its sides tangent to the sides of the peak, or

(e) compute the cumulative sum of the signal magnitude and measure the heights of the resulting steps (see figure below).

The best method for calculating the area under a peak depends whether the peak is isolated or overlapped with other peaks or superimposed on a non-zero baseline or not. For an isolated peak, Yuri Kalambet (reference 71) has shown that the trapezoidal rule area is efficient estimate of full peak area with extraordinary low error, whereas Simpson's rule is less efficient in full area integration. For a Gaussian peak, the trapezoidal rule requires 0.62 points per standard deviation (or 2.5 points within the 4*sigma basewidth) to achieve an integration error of only 0.1%. This is demonstrated by a digital simulation of the effect of sampling rate (data density) on the accuracy of peak area measurements for single isolated sparsely sampled Gaussian peaks (below left).

The
classical way to handle the overlapping peak problem is to draw
two vertical lines from the left and right bounds of the peak down
to the x-axis and then to measure the total area bounded by the
signal curve, the x-axis (y=0 line), and the two vertical lines,
shown the the shaded area in the figure on the left, below. This
is often called the *perpendicular drop* method; it's an
easy task for a computer, although tedious to do by hand. The left
and right bounds of the peak are usually taken as the valleys
(minima) between the peaks or as the point half-way between the
peak center and the centers of the peaks to the left and right.
The basic assumption is that the area missed by cutting off the
feet of one peak is made up for by including the feet of the
adjacent peak. This is accurate only of the peaks are symmetrical,
not too overlapped, and equal in height and in width. In addition,
the baseline must be zero; any extraneous background signal must
be subtracted before measurement. Using this method it is possible
to estimate the area of the second peak in the example below to an
accuracy of about 0.3%, but the last two peaks give errors greater
than 4%. As a rough rule, the valley between the peaks must be
quite low, perhaps a quarter or a fifth of the adjacent peak
height, for this method to be acceptable. Even so, this method is
widely used because there is no simple alternative. If
there is no valley between the peaks you need to measure, it's
possible to apply peak
sharpening techniques to narrow the peaks and deepen the
valley before the perpendicular drop measurement; see PeakSharpeningAreaMeasurementDemo.xlsm
(screen image).
Moreover, *asymmetrical *peaks that are the result of exponential
broadening can be narrowed and symmetricalized
by the weighted addition of its first derivative, making the
perpendicular drop area measurements much more
accurate. In both cases, it may be necessary to set the
strength of sharpening higher than previously recommended, if it
that is the only way to form a valley between peaks whose areas
you want to measure.

*Peak
area measurement for overlapping peaks, using**
the perpendicular drop method (left, shaded area)*

In the case where a single peak is superimposed on a straight or
broadly curved baseline, you might use the *tangent skim method*,
which measures the area between the curve and a linear baseline
drawn across the bottom of the peak (e.g. the *shaded area*
in the figure on the right, above). In general, the hardest part
of the problem and the greatest source of uncertainty is
determining the shape of the baseline under the peaks and
determining when each peaks begins and ends. Once those are
determined, you subtract the baseline from each point between the
start and end points, add them up, and multiply by the x-axis
interval. Incidentally, smoothing a noisy signal does not change
the areas under the peaks, but it may make the peak start and stop
points easier to determine. The downside of smoothing is that
increases peak width and the overlap between adjacent peaks.
Numerical methods of peak sharpening, for example derivative sharpening and
Fourier deconvolution, can help
with the problem of peak overlap, and both of these techniques
have the useful property that they do not change the total area
under the peaks.

If the *shape *of peaks is known, the most general way to
measure the areas of overlapping peaks is to use some type of
least-squares curve fitting, as is discussed in the three
following sections (A, B, C).
If the peak positions, widths, and amplitudes are unknown,
and only the fundamental peak shapes are known, then the iterative least-squares method can
be employed. In many cases, even the background can be accounted
for by curve fitting.

For
gas chromatography and mass spectrometry specifically, **Philip
Wenig's OpenChrom** is an *open source*
data system that can import binary and textual chromatographic
data files directly. It includes methods to detect baselines and
to measure peak areas in a chromatogram. Extensive documentation
is available. It is available for Windows, Linux, Solaris and
Mac OS X. A screen shot is shown on the left (click to
enlarge). The program and its documentation is regularly updated
by the author.

Another freely-available open-source program for mass
spectroscopy is "Skyline"
from MacCoss Lab Software,
which is specifically aimed at reaction monitoring. Tutorials and
videos are available.

Peak area measurement using spreadsheets.

EffectOfDx.xlsx (screen image) demonstrates that the simple equation sum(y)*dx accurately measures the peak area of an isolated Gaussian peak if there are at least 4 or 5 points visibly above the baseline (or 2.5 points within the 4*sigma basewidth) and as long as you include the points out to plus and minus at least 2 or 3 standard deviations of the Gaussian. It also shows that an exponentially broadened Gaussian needs to include more points on the tailing (right-hand, in this case) side to achieve the best accuracy. EffectOfNoiseAndBaseline.xlsx (screen image) demonstrates the effect of random noise and non-zero baseline, showing that the area is more sensitive to non-zero baseline that the same amount of random noise.

CumulativeSum.xls (screen image) illustrates integration of a peak-type signal by normalized cumulative sum; you can paste your own data into columns A and B. CumulativeSumExample.xls is an example with data . The

The symmetrization of exponentially broadened peaks by the weighted addition of the first derivative is performed by the template PeakSymmetrizationTemplate.xlsm (graphic); PeakSymmetrizationExample.xlsm is an example application with sample data already typed in. The procedure here is first to adjust k1 to get the most symmetrical peak shapes (judged by equal but opposite slopes on the leading and trailing edges), then enter the start time, valley time, and end time from the graph for the pair of peaks you want to measure into cells B4, B5, and B6, and finally (optionally) adjust the second derivative sharpening factor k2. The perpendicular drop areas of those two peaks are reported in the table in columns F and G. These spreadsheets have Active-X clickable buttons to adjust the first derivative weighting factor (k1) in cell J4 and the second derivative sharpening factor k2 (cell J5). There is also a demo version that allows you to determine the accuracy of perpendicular drop peak areas under different conditions by internally generating overlapping peaks of known peak areas, with specified asymmetry (B6), relative peak height (B3), width (B4), and noise (B5): PeakSymmetrizationDemo.xlsm (graphic). For peaks that have a more complex broadening behavior, the template PeakDoubleSymmetrizationExample.xlsm allows symmetrization of

Peak area measurement using Matlab and Octave.

Matlab and Octave have built-in commands for the sum of elements ("sum", and the cumulative sum "cumsum") and the trapezoidal numerical integration ("trapz"). For example, these three Matlab commands

These lines accurately compute the area
under the curve of x,y (in this case an isolated Gaussian,
whose area is theoretically known to be the square
root of pi, sqrt(pi), which is 1.7725. If the interval between x
values, dx, is *constant*, then the area is
simply yi=sum(y).*dx. Alternatively,
the signal can be integrated using yi=cumsum(y).*dx, then the
area of the peak will be equal to the height of the
resulting step, max(yi)-min(yi)=1.7725.

For a peak of

But the peaks in real signals have some complications:

(a) their shapes might not be known;These must be taken into account to measure accurate areas

(b) they may be superimposed on a baseline; and

(c) they may be overlapped with other peaks.

**Overlapping peaks**. The following Matlab/Octave code uses
the perpendicular drop (PD) method to measure the areas of two
overlapping symmetrical peaks in the data vectors x,y by the
perpendicular drop method. Variables "m1" and "m2" are the
estimated positions of the two peaks. The "val2ind"
function returns the index number of the value in a vector that
value matches the specified value.

The third
line finds the half-way point between the two peaks. The last
two lines use the trapz function to measure the areas before and
after the valley point.

`index1=val2ind(x,m1);`

`index2=val2ind(x,m2);`

`valleyindex`=val2ind(x,(m1+m2)/2),

`PDMeasArea1=trapz(x(1:valleyindex),y(1:valleyindex));`

`PDMeasArea2=trapz(x(valleyindex:length(x)),y(valleyindex:length(x));
`

Although the perpendicular drop method remains popular, there are other geometrical methods that can work better in many cases. The "equalization" method, illustrated in the figure on the left, uses another method of locating the perpendicular drop point. A line of three straight-line segments is constructed that touches the estimated maxima of the two peaks, shown by the dotted red line called the "cline" in the figure on the left. The quotient of the original signal, in blue, divided by this line, results in a temporarily normalized signal (the yellow line) that has more nearly equal peak heights. The effect of this treatment is to

The "reflection/subtraction" method, shown on the right, is simpler. As before, the original signal is shown in blue. An estimate of the isolated first (larger) peak is constructed by reflecting its left half and using it to replace the right half, resulting in the red dotted line in the figure, assuming that the first peak is symmetrical. Then that peak is simply subtracted from the entire signal to reveal the isolated second peak (dotted yellow line). The two areas are then separately calculated by the "trapz" function. This process is also easily automated, given only the peak position of the first peak. It works perfectly only if the first peak is the larger of the two peak and is symmetrical and if the peak separation is sufficient so that the left-hand tail of the smaller peak does not significantly increase the height of the first peak.

So, how do these new methods compare to the traditional perpendicular drop method? Matlab/Octave code for all of these methods is contained in the script "OverlapAreaComparison.m" which compares the accuracy of several methods of peak area measurement for two simulated partially overlapped Gaussian peaks with variable height ratios and resolutions. (There is also an interactive Live Script version of this script.) For the case of Gaussian peak with a resolution of 0.7 and a height ratio of 1 to 0.5, the relative percent error of the peak areas are:

Peak 1Peak 2Perpendicular drop, valley point: -6.44% 12.89%

Perpendicular drop, half-way point: 3.91% -7.83%Equalizationmethod: 1.27% -2.54%

Subtractionmethod:-2.12% 4.25%

You can change the parameters in lines 5 through 10 to test with other peak separations and relative peak heights. The equalization method is often, but not always, the most accurate method. (Note: the script requires the downloadable functions val2ind.m, halfwidth.m, ExpBroaden.m, and plotit.m functions be in the path).

A more through investigation of these methods demonstrates the effect of changing the peak resolution, shown on the left (script, graphic) and of changing the the height of the smaller peak, shown on the right (script, graphic). These scripts include the effect of

Things are much easier and more forgiving in quantitative analysis, using a calibration curve, because in that case absolute area accuracy is not really necessary. Rather, it is really the reproducibility of the areas that is key.

All of these methods produce significant errors if the peaks are highly overlapped or asymmetrical. However, asymmetry that is the result of

Automatic multiple peak detection

[M,A]=autopeaks.m is basically a combination or autofindpeaks.m and measurepeaks.m. It has similar syntax to measurepeaks.m, except that the peak detection parameters (SlopeThreshold, AmpThreshold, smoothwidth peakgroup, and smoothtype) can be omitted and the function will calculate trial values in the manner of autofindpeaks.m. Using the simple syntax [M,A]=autopeaks(x, y) works well in some cases, but if not try [M,A]=autopeaks(x, y,

The script testautopeaks.m runs all the examples in the autopeaks help file, with a 1-second pause between each one, printing out results in the command window and additionally plotting and numbering the peaks (Figure window 1) and each individual peak (Figure window 2); it requires gaussian.m and fastsmooth.m in the path.

SharpenedOverlapCalibrationCurve.m is a script that simulates the construction and use of calibration curves of three overlapping Gaussian peaks (the blue lines in the signal plots on the left) . Even-derivative sharpening (the red line in the signal plots) is used to improve the resolution of the peaks to allow perpendicular drop area measurement. A straight line is fit to the calibration curve and the R2 is calculated, in order to demonstrate (1) the linearity of the response, and (2) in independence of the overlapping adjacent peaks. You can easily change:

1. The resolution, Rs, by changing the peak width

2. The peak ratios, by changing the minimum and maximum peaks in lines 21 and 22. Default is 0.2 and 1.0. (1:5 ratio range)

3. The number of standards in line 24. Larger numbers give a more reliable calibration curve.

4. The number of simulated samples, line 25. Larger numbers give more reliable average errors.

SymmetrizedOverlapCalibrationCurve.m is the same thing for symmetrization of overlapping exponentially modified Gaussian peaks by first-derivative addition. The critical variable is "factor" in line 27, which for best results should match or slightly exceed "tau", the exponential time constant in line 19. To compare to using the original signal, set "factor" to 0.1. You must have gaussian.m, derivxy.m, autopeaks.m, val2ind.m, halfwidth.m, fastsmooth.m, and plotit.m in the path for either of these two scripts.

The Matlab/Octave automatic peak-finding function

iSignal (shown on the left) is a downloadable interactive multipurpose signal processing Matlab function that includes various signal processing functions described in this tutorial, including measurement of peak area using Simpson's Rule and the perpendicular drop method. Click to view or right-click > Save link as... here, or you can download the ZIP file with sample data for testing. It is shown on the left applying the perpendicular drop method to a series of four peaks of equal area. (Look at the bottom panel to see how the measurement intervals, marked by the vertical dotted magenta lines, are positioned at the valley

Here's a bit of Matlab/Octave code that creates four computer-synthesized Gaussian peaks, similar to this figure, that

x=[0:.01:18];

y=exp(-(x-4).^2) + exp(-(x-9).^2) + exp(-(x-12).^2) + exp(-(x-13.7).^2);

isignal(x,y);

To use

Peak # Position Height Width Area

1 4.00 1.00 1.661 1.7725

2 9.001 1.0003 1.6673 1.77

3 12.16 1.068 2.3 1.78

4 13.55 1.0685 2.21 1.79

The area results are reasonably accurate in this example only because the perpendicular drop method roughly compensates for partial overlap between peaks, but

Peak area measurements by multiple
methods is a part of the Live Script Peak detection tool **PeakDetection.mlx**
described here
and illustrated on the left. This interactive tool allows optional
first-derivative
symmetrization of skewed peaks, as well as symmetrical
sharpening by Fourier
self-deconvolution, to enhance the resolution of overlapping
peaks and improve the accuracy of peak are measurement.

For example, using the

>> peakfit([x;y],9,18,4,1,0,10,0,0,0)

Peak # Position Height Width Area

1 4 1 1.6651 1.7725

2 9 1 1.6651 1.7725

3 12 1 1.6651 1.7725

4 13.7 1 1.6651 1.7725

>> ipeak([x,y],10)

Peak # Position Height Width Area

1 4 1 1.6651 1.7727

2 9.0005 1.0001 1.6674 1.7754

3 12.16 1.0684 2.2546 2.5644

4 13.54 1.0684 2.2521 2.5615

Peaks 1 and 2 are measured accurately by

Fitting Error 0.0002165%

Peak# Position Height Width Area

1 4 1 1.6651 1.7724

2 9 1 1.6651 1.7725

3 12 1 1.6651 1.7725

4 13.7 0.99999 1.6651 1.7724

Correction for background/baseline. The presence of a baseline or background signal, on which the peaks are superimposed, will greatly influence the measured peak area if not corrected or compensated.

The script AreasOfIsolatedPeaks2.m demonstrates the use of peakfit.m for a simulated experimental signal consisting of several isolated peaks on a straight tilted baseline. In this example, the positions of the peaks are assumed to have been previously measured by a peak detector algorithm and to be reproducible enough that a pre-determined set of measurement segments can be used to fit each peak separately and determine its position, height, width, and area.

Here's a Matlab/Octave experiment that compares several

iSignal, using perpendicular drop in baseline mode 1, seriously underestimates both peak areas (168.6 and 81.78).

An automated tangent skim measurement by measurepeaks is not accurate in this case because the peaks do not go all the way down to the baseline at the edges of the signal and because of the slight overlap:

An attempt to use curve fitting with

1 500 2.0001 90.005

2 700 0.99999 89.998

3 5740.2 8.7115e-007 1 1200.1

AsymmetricalAreaTest.m is a Matlab/Octave script that compares the accuracy of peak area measurement methods for a single noisy asymmetrical peak measured by different methods: (A) Gaussian estimation,(B) triangulation, (C) perpendicular drop method, and curve fitting by (D) exponentially broadened Gaussian, and (E) two overlapping Gaussians. AsymmetricalAreaTest2.m is similar except that it compares the precision (standard deviation) of the areas. For a single peak with zero baseline, the perpendicular drop and curve fitting methods work equally well, both considerably better than Gaussian estimation or triangulation. The advantage of the curve fitting methods is that they can deal more accurately with peaks that overlap or that are superimposed on a baseline.

Here's a Matlab/Octave experiment that simulates a signal containing five Gaussian peaks with the

>> x=5:.1:65;

>> y=modelpeaks2(x, [1 5 5 5 5], [1 1 1 1 1], [10 20 30 40 50], [3 3 3 3 3], [0 -5 -10 -15 -20]);

>> plot(x,y)

The theoretical area under these Gaussians is

As the broadening is increased from left to right, the peak height

2

3

4

5

The triangle construction method (using

2

3

4

5

The automated function measurepeaks.m gives better results using the perpendicular drop method (5th column of table).

Peak Position PeakMax Peak-val.UsingPerp dropTan skim

1 10 1 .990473.18713.11232 20.4 .94018 .928973.18393.0905

330.709 .83756 .818053.15972.97944 40.93 .74379 .707623.11882.76345 51.095 .66748 .610433.08352.5151

But we can obtain a more accurate automated measurement of all five peaks by iterative curve fitting, using

The results in this case are disappointing; the fitting error is not much better than the simple Gaussian fit.

Better results can be had using preliminary position and width results obtained from the findpeaks function or by curve fitting with a simple Gaussian fit and using those results as the "start" vector:

Even more accurate results for area are obtained using peakfit with one Gaussian and four

The latter approach works because, although the

The

Alternatively, if the objective is only to measure the peak

Next, we simulate an

>> [FitResults,FittingError]=peakfit([x;y],30,54,5, [1 8 8 8 8], [0 -5 -10 -15 -20] ,20, [20 3.5 25 3.5 31 3.5 36 3.5 41 3.5],0)

FitResults =

1 19.999 1.0015 2.9978 3.1958

2 25.001 1.9942 3.0165 6.4034

3 30 3.0056 2.9851 9.5507

4 34.997 3.9918 3.0076 12.78

5 40.001 4.9965 3.0021 15.966

FittingError =

0.2755

The measured areas in this case (last column) are very close to to the theoretical values, whereas all the other methods give substantially poorer accuracy. The more overlap between peaks, and the more unequal are the peak heights, the poorer the accuracy of the perpendicular drop and triangle construction methods. If the peaks are so overlapped that separate maxima are not visible, both methods fail completely, whereas curve fitting can often retrieve a reasonable result, but

SymmetizedOverlapDemo.m, illustrated on the left, demonstrates the optimization of the first derivative symmetrization for the measurement of the areas of two overlapping exponentially broadened Gaussians. It plots and compares the original (blue) and sharpened peaks (red), then tries first-derivative weighting factors from +10% to -10% of the correct tau value in line 14) plots absolute peak area errors vs factor values. You can change the resolution by changing either the peak positions in lines 17 and 18 or the peak width in line 13. Change height in line 16. Must have derivxy.m, autopeaks.m, and halfwidth.m in the path. This method also easily deals with

This page is part of "

Unique visits since May 17, 2008: