[Sources]
[Measuring noise] [Signal-to-noise ratio] [Detection limit] [Ensemble averaging] [Frequency spectrum] [Dependence on signal amplitude]
[Probability distribution] [Spreadsheets] [Matlab/Octave] [Difference between scripts and
functions] [Live Scripts] [Interactive tools]
[Other examples]

Experimental measurements are never perfect, even with
sophisticated modern instruments. Two main types or measurement
errors are recognized: (a) and

If you are lucky enough to have a sample and an instrument that are completely stable (

But what if the measurements are not that reproducible or that you had only one recording of that spectrum and no other data? In that case, you could try to estimate the noise in that single recording, based on the

It's important to appreciate that the standard deviations calculated of a small set of measurements can be much higher or much lower than the actual standard deviation of a larger number of measurements. For example, the Matlab/Octave function randn(1,

A quick but approximate way to estimate the amplitude of noise visually is the

In addition to the

The quality
of a signal is often expressed quantitatively as the
signal-to-noise* ratio* (S/N ratio), which is the ratio of
the true underlying signal amplitude (e.g. the average amplitude
or the peak height) to the standard deviation of the noise. Thus
the S/N ratio of the spectrum in Figure 1 is about 0.08/0.001 =
80, and the signal in Figure 3 has a S/N ratio of 1.0/0.2
= 5. So we would say that the quality of the signal in
Figure 1 is better than that in Figure 3 because it has a
greater S/N ratio. Measuring the S/N ratio is much easier
if the noise can be measured separately, in the absence of
signal. Depending on the type of experiment, it may be possible
to acquire readings of the noise alone, for example on a segment
of the baseline before or after the occurrence of the signal.
However, if the magnitude of the noise depends on the level of
the signal, then the experimenter must try to produce a constant
signal level to allow measurement of the noise on the signal. In
some cases, where you can model the shape of the signal
accurately by means of a mathematical function (such as a polynomial or the weighted sum
of a number of peak shape
functions), the noise may be isolated by subtracting the model
from the unsmoothed experimental signal, for example by looking
at the residuals in least-squares curve fitting, as in this example. If
possible, it's usually better to determine the standard
deviation of repeated measurements of the thing that you want to
measure (e.g. the peak heights or areas), rather than trying to
estimate the noise from a single recording of the data.

**Detection limit**. The
"detection limit" is defined as the smallest signal that you can
reliably detect in the presence of noise. In quantitative
analysis, it is usually
defined as the concentration that produces the smallest
detectable signal (Reference
90). A signal below the detection limit cannot be reliably
detected, that is, if the measurement is repeated, the signal
will often be "lost in the noise" and reported as zero. A signal
above the detection limit will be reliable detected and will
seldom or never reported as zero. The most common definition of
signal-to-noise ratio at the detection limit is 3. This is
illustrated in the figure on the left (created by the
Matlab/Octave script SNRdemo.m). This
shows a noisy signal in the form of a rectangular pulse. We
define the "signal" as the average signal magnitude during the
pulse, indicated by the red line, which is 3 ("signal" in line 3
of the script, which you can change). We define the "noise" as
the the standard deviation of the random noise on the baseline
before and after the pulse, which is about 1.0 (roughly 1/5 of
the peak-to-peak baseline noise indicated by the two black
horizontal lines). So the signal-to-noise ratio (SNR) in this
case is about 3, which is the most common definition of
detection limit. This means that this is the lowest signal that
can be reliably detected and that signals lower than this should
be reported as "undetectable".

But there is a problem. This signal is clearly detectable by
eye; in fact, it would be possible to visually detect lower
signals than this. How can this be? The answer is "averaging".
When you look at this signal, you are *unconsciously
estimating the average of the data points* on the signal
pulse and on the base-line, and your visual detection ability in
enhanced by this averaging. Without that averaging, looking only
at *individual *data points in the signal, only about
half those individual points would meet the SNR=3 criterion. You
can see in the graphic above that several points on the signal
peak are actually *lower *that some of the data points on
the baseline. But this is not a problem in practice, because any
properly written software will include averaging that duplicates
the visual averaging that we all do.

In the script SNRdemo.m, the number of
points averaged is controlled by the variable "AveragePoints" in
line 7. If you set that to 5, the resulting graphic (below on
the left) shows that all of the signal
points are above the highest baseline points. This graphic more
accurately represents what we judge when we look at a signal
like that in the previous graphic: a clear separation of signal
and baseline. The SNR of the peak has improved from 3.1 to 7.7
and *the detection limit will be correspondingly reduced*.
As a rule of thumb, the noise decreases by the roughly the
square root of the number of points averaged (sqrt(5)=2.2).
Higher values will further improve the SNR and reduce the
relative standard deviation of the average signal, but the
response time - which is the time it takes for the signal to
reach the average value - will become slower and slower as the
number of points averaged increases. This is shown by this graphic with 100 points
averaged. With a much lower signal, where the SNR is only 1.0,
the raw signal is barely detectable
visually, but with a 100 point average, the signal precision is good.
Digital averaging beats visual averaging in this case.

In SNRdemo.m, the noise is constant
and independent of the signal amplitude. In the variant SNRdemoHetero.m, the noise in the
signal is directly proportional to the signal level, and as a
result the detection limit depends on the constant baseline
noise (graphic). In the variant
SNRdemoArea.m, it is the peak *area
*that is measured rather than the peak height, and as a
result the SNR is improved by the square root of the width of
the peak (graphic).

An example of a practical application of
a signal like this would be to turn on a warning light or buzzer
if the signal ever exceeds a threshold value of 1.5 volts, for
the signal illustrated in the figures above. This would not work
if you used the raw unaveraged signal in the first figure; there
is no threshold value that would never be exceeded by the
baseline but always exceeded by the signal. Only the
*averaged *signal would reliably turn on the alarm above
the threshold and never activate it below the threshold.

You will also hear the term "Limit of determination", which
is the lowest signal or concentration that achieves a minimum
acceptable precision, defined as the relative standard deviation
of the signal amplitude. That is defined at much higher
signal-to-noise ratio, say 10 or 20, depending on the
requirements of your applications.

Averaging such as done here is the simplest form of
"smoothing", which is covered in the next chapter.

**Ensemble averaging**.
One key thing that really distinguishes signal from noise is
that random noise is not the same from one measurement of the
signal to the next, whereas the genuine signal is at least
partially reproducible. So if the signal can be measured more
than once, use can be made of this fact by measuring the signal
over and over again, as fast as is practical, and adding up all the
measurements point-by-point, then dividing by the number of
signals averaged. This is called *ensemble averaging*,
also called "ensemble averaging", and it is one of the most
powerful methods for improving signals, when it can be applied.
For this to work properly, the noise must be random and the
signal must occur at the same time in each repeat.

*Window 1 (left) is a single measurement of a very noisy
signal. There is actually a broad peak near the center of this
signal, but it is difficult to measure its position, width,
and height accurately because the S/N ratio is very poor.
Window 2 (right) is the average of 9 repeated measurements of
this signal, clearly showing the peak emerging from the noise.
The expected improvement in S/N ratio is 3 (the square root of
9). Often it is possible to average hundreds
of measurements, resulting in much more substantial
improvement. The S/N ratio in the resulting average signal in
this example is about 5. *

Noise that has a more low-frequency-weighted character, that is, that has more power at low frequencies than at high frequencies, is often called "pink noise". In the acoustical domain, pink noise sounds more like a

Conversely, noise that has more power at high frequencies is called "blue" noise. This type of noise is less commonly encountered in experimental work, but it can occur in processed signals that have been subject to some sort of differentiation process or that have been deconvoluted from some blurring process. Blue noise is

Often, there is a mix of noises with different behaviors; in optical spectroscopy, three fundamental types of noise are recognized, based on their origin and on how they vary with light intensity: photon noise, detector noise, and flicker (fluctuation) noise. Photon noise (often the limiting noise in instruments that use photo-multiplier detectors) is white and is proportional to the

Only in a very few special cases is it possible to eliminate noise completely, so usually you must be satisfied by increasing the S/N ratio as much as possible. The key in any experimental system is to understand the possible sources of noise, break down the system into its parts and measure the noise generated by each part separately, then seek to reduce or compensate for as much of each noise source as possible. For example, in optical spectroscopy, source flicker noise can often be reduced or eliminated by using in feedback stabilization, choosing a better light source, using an internal standard, or specialized instrument designs such as double-beam, dual wavelength, derivative, and wavelength modulation. The effect of photon noise and detector noise can be reduced by increasing the light intensity at the detector or increasing the spectrometer slit width, and electronics noise can sometimes be reduced by cooling or upgrading the detector and/or electronics. Fixed pattern noise in array detectors can be corrected in software. Only

This is easily demonstrated by a little simulation. In the example on the left, we start with a set of 100,000 uniformly distributed random numbers that have an equal chance of having any value between certain limits - between 0 and +1 in this case (like the "rand" function in most spreadsheets and Matlab/Octave). The graph in the upper left of the figure shows the probability distribution, called a "histogram",

Remarkably, the distributions of the individual events hardly matter at all. You could modify the individual distributions in this simulation by including additional functions, such as sqrt(rand), sin(rand), rand^2, log(rand), etc, to obtain other radically non-normal individual distributions. It seems that no matter what the distribution of the single random variable might be, by the time you combine even as few as four of them, the resulting distribution is already visually close to normal. Real world macroscopic observations are often the result of thousands or millions of individual microscopic events, so whatever the probability distributions of the individual events, the combined macroscopic observations approach a normal distribution essentially perfectly. It is on this common adherence to normal distributions that the common statistical procedures are based; the use of the mean, standard deviation σ , least-squares fits, confidence limits, etc, are all based on the assumption of a normal distribution. Even so, experimental errors and noise are not always normal; sometimes there are very large errors that fall well beyond the "normal" range. They are called "outliers" and they can have a very large effect on the standard deviation σ . In such cases it's common to use the "interquartile range" (IQR), defined as the difference between the upper and lower quartiles, instead of the standard deviation, because the interquartile range is not effected by a few outliers. For a normal distribution, the interquartile range is equal to 1.34896 times the standard deviation. A quick way to check the distribution of a large set of random numbers is to compute both the standard deviation and the interquartile range; if they are roughly equal, the distribution is probably normal; if the standard deviation is much larger, the data set probably contains outliers and the standard deviation without the outliers can be better estimated by dividing the interquartile range by 1.34896.

It important to understand that the three characteristics of noise just discussed in the paragraphs above - the frequency distribution, the amplitude distribution, and the signal dependence - are mutually independent; a noise may in principle have any combination of those properties.

The role of simulation and modeling.

A simulation is an imitation of the operation of a real-world process or system over time. Simulations require the use of

**Visual animation of ensemble averaging.** This
17-second video (EnsembleAverage1.wmv)
demonstrates the ensemble averaging of 1000 repeats of a signal
with a very poor S/N ratio. The signal itself consists of three
peaks located at x = 50, 100, and 150, with peak heights 1, 2,
and 3 units. These signal peaks are buried in random noise whose
standard deviation is 10. Thus the S/N ratio of the smallest
peaks is 0.1, which is far too low to even see a signal, much less
measure it. The video shows the accumulating average signal as
1000 measurements of the signal are performed. At the end, the
noise is reduced (on average) by the square root of 1000 (about
32), so that the S/N ratio of the smallest peaks ends up being
about 3, just enough to detect the presence of a peak reliably.
Click here to download
the video (2 MBytes) in WMV format. (This demonstration was
created in Matlab 6.5. If you have access to that software, you
may download the original m-file, EnsembleAverage.zip).

Popular spreadsheets, such as Excel or Open Office Calc, have built-in functions that can be used for calculating, measuring and plotting signals and noise. For example, the cell formula for one point on a

Most spreadsheets have only a

The interquartile range (IQR) can be calculated in a spreadsheet by subtracting the third quartile from the first (e.g.

Matlab and Octave have built-in functions that can be used for for calculating, measuring and plotting signals and noise, including mean, max, min, std, kurtosis, skewness, plot, hist, histfit, rand, and randn. Just type "help" and the function name at the command >> prompt, e.g. "help mean". Most of these functions apply to vectors and matrices as well as scalar variables. For example, if you have a series of results in a vector variable 'y', mean(y) returns the average and std(y) returns the standard deviation of all the values in y. For vectors, std computes sqrt(mean(y.^2)). You can subtract a scalar number from a vector (for example, v = v-min(v) sets the lowest value of vector v to zero). If you have a set of signals in the rows of a matrix

As an example of the "randn" function in Matlab/Octave, it is used here to generate 100 normally-distributed random numbers, then the "hist" function computes the "histogram" (probability distribution) of those random numbers, then the downloadable function peakfit.m fits a Gaussian function (plotted with a red line) to that distribution:

>> peakfit([X;N]);

If you change the 100 to 1000 or a higher number, the distribution becomes closer and closer to a perfect Gaussian and its peak falls closer to 0.00. The "randn" function is useful in signal processing for predicting the uncertainty of measurements in the presence of random noise, for example by using the Monte Carlo or the bootstrap methods that will be described in a later section. (You can copy and paste, or drag and drop, these two lines of code into the Matlab or Octave editor or into the command line and press

Here is an MP4 animation that demonstrates the gradual emergence of a Gaussian normal distribution and the number of samples increase from 2 to 1000. Note how many samples it takes before the normal distribution is well-formed.

**The difference
between scripts and functions**.

Scripts and functions are just simple text files saved with the ".m" file extension to the file name. The difference between a script and a function is that a function definition begins with the word 'function'; a script is just any list of Matlab commands and statements. For a

[output variables] = FunctionName(input variables)

That means that functions are a great way to package chunks of code that perform useful operations in a form that can be used as components in

function relstddev=rsd(x)

% Relative standard deviation of vector x

relstddev=std(x)./mean(x);

If you run one of my scripts and get an error message that says, "Undefined function...", you need to download the specified function from http://tinyurl.com/cey8rwh and place it in the Matlab/Octave path. Note: in Matlab R2016b or later, you CAN include functions within scripts; just place them at the end of the script and add an additional "end" statement to each. (see https://www.mathworks.com/help/matlab/matlab_prog/local-functions-in-scripts.html.

For writing or editing scripts and functions, Matlab and the latest version of Octave have an internal editor. For an explanation of a function and a simple worked example, type "help function" at the command prompt. When you are writing your own functions or scripts, you should always add lots of "comment lines" (beginning with the character %) that explain what is going on.

Here's a very handy helper: when you type a
function name into the Matlab editor, if you *pause for a
moment* after typing the open parenthesis immediately after
the function name, Matlab will display a pop-up listing all the
possible input arguments as a reminder. *This works even for
downloaded functions and for any new functions that you
yourself create*. It's especially handy when there are so
many possible input arguments that it's hard to remember all of
them. The popup *stays on the screen as you type*,
highlighting each argument in turn:

This feature is easily overlooked, but it's very handy. Clicking
on __More
Help...__ on the right displays the help for
that function in a separate window.

**Some examples** of my Matlab/Octave user-defined functions
related to signals and noise that you can download and use are:
stdev.m, a standard deviation function
that works in both Matlab and in Octave; rsd.m,
the relative standard deviation; halfwidth.m for measuring
the full width at half maximum of smooth peaks; plotit.m, an easy-to-use
function for plotting and fitting x,y data in matrices
or in separate vectors; functions
for peak shapes commonly encountered in analytical
chemistry such as Gaussian, Lorentzian, lognormal, Pearson 5, exponentially-broadened Gaussian, exponentially-broadened Lorentzian, exponential
pulse, sigmoid, Gaussian/Lorentzian blend, bifurcated Gaussian, bifurcated Lorentzian), Voigt profile, triangular and peakfunction.m, a function
that generates any of those peak types specified by
number. ShapeDemo demonstrates the 12
basic peak shapes graphically,
showing the variable-shape peaks as multiple lines.
There are functions for different types of random
noise (white noise, pink noise, blue noise, proportional noise, and square root noise), a
function that applies exponential broadening (ExpBroaden.m),
a function that computes the interquartile range (IQrange.m),
a function that estimates the standard deviation of a
distribution with outliers by computing the
interquartile range and dividing it by 1.34896 (stdiqr.m); a function
that removes "not-a-number" entries from vectors (rmnan.m), and a
function that returns the index and the value of the
element of vector x that is closest to a particular
value (val2ind.m). These
functions can be useful in modeling and simulating
analytical signals and testing measurement techniques.
You can click or ctrl-click on these links to inspect the code or you can
right-click and select "Save link as..."
to download them to your computer.
Once you have downloaded those functions and placed them in the
"path", you can use them just like any other built-in function.
For example, you can plot a simulated Gaussian peak with white
noise by typing: x=[1:256];
y=gaussian(x,128,64) + whitenoise(x); plot(x,y). The
script plotting.m uses the gaussian.m function to demonstrate the
distinction between the *height*, *position*, and *width
*of a Gaussian curve. The script SignalGenerator.m
calls several of these downloadable functions to create and plot
a realistic computer-generated signal with multiple peaks on a
variable baseline plus variable random noise; you might try to
modify the variables in the indicated places to make it look
like your type of data. These functions have been developed and
tested in Matlab 7.8 (R2009a), 8.1 (R2013a), 9.3 (R2017b home
version), R2018b Student version, and in R2020b update 3. Almost
all of these functions will work in the latest version of Octave without change.
For a complete list of downloadable functions and
scripts developed for this project, see functions.html.

The Matlab/Octave script EnsembleAverageDemo.m
demonstrates ensemble averaging to improved the S/N ratio of a
very noisy signal. Click for
graphic. The script requires the "gaussian.m"
function to be downloaded and placed in the Matlab/Octave path,
or you can use another peak shape function, such as lorentzian.m or rectanglepulse.m.

The Matlab/Octave function noisetest.m demonstrates
the
appearance and effect of different noise types. It
plots Gaussian peaks with four different types of
added noise: constant white noise, constant pink (1/f) noise,
proportional white noise, and square-root white noise, then fits
a Gaussian to each noisy data set and computes the average and
the standard deviation of the peak height, position, width and
area for each noise type. Type "help noisetest" at the command
prompt. The Matlab/Octave script SubtractTwoMeasurements.m
demonstrates the technique of subtracting two separate
measurements of a waveform to extract the random noise (but
it works only if the signal is stable, except for the
noise). Graphic.

This example shows four types of interactive controllers. Line 1 shows a button that opens a

Live Scripts produce graphic output in small windows on the right side of the Live editor window, where you can copy, pan and zoom and export to png files as usual using the mouse. You can also convert any Live Script graphic into a standard figure window by clicking its upper right corner; the standard figure window can then be exported to other graphic formats, expanded to full screen, printed, etc.

Practical examples of Live Scripts on this site include a versatile data smoothing tool, a self-deconvolution script, a peak-fitting tool, and a peak detection tool.

Live scripts are surprisingly easy to create by modifying a conventional script. In Matlab, you can simply open a conventional (.m) script in the Live Editor and insert the interface devices directly into the script where the numbers in assignment statements would have gone. When you save it, it becomes a .mlx file. See AppendixAF.html

Others in this group of interactive functions include

For signals that contain repetitive waveform patterns occurring in one continuous signal, with nominally the same shape except for noise, the interactive peak detector function iPeak has an ensemble averaging function (

See Appendix S: Measuring the Signal-to-Noise Ratio of Complex Signals for more examples of S/N ratio in Matlab/Octave.

This page is part of "