Bayesian inference results in more- or equally-accurate diagnostic classification, especially with complex and noisy data sets such as those found when developing IVDs.
The statistical analysis methods that have primarily been taught in schools follow the frequentist approach. However, another approach, Bayesian statistics, is becoming more widespread. With today’s noisy data sets, Bayesian techniques allow scientists to extract the most information from an experimental data set, which can lead to the development of more-accurate diagnostic instruments.
FDA has recently recognized the value of Bayesian analysis compared with frequentist methods in conducting clinical trials for medical devices. Bayesian statistics provide a coherent way to learn from data as they accumulate and a formal mathematical technique for combining data with prior information. This can lead to smaller amounts of data needed in clinical trials for devices. The same argument holds in applying Bayesian techniques to the development of processing algorithms for diagnostic instruments. Devices suited for Bayesian techniques include microarrays, flow cytometry, proteomics, and time-of-flight spectroscopy. Such devices produce data that can be handled effectively by Bayesian analysis.
Frequentists and Bayesians stridently advocate their statistical approach, bordering on religious fervor. This article compares Bayesian and frequentist methods and illustrates applications in which the lesser-used Bayesian methods have significant advantages, in particular in developing diagnostic products. Because Bayesian methods ask and answer the right questions, they can lead to the development of more reliable, effective, and low-cost diagnostic solutions.
Description of Bayesian and Frequentist Methods
Bayesian techniques have become more widely used in real-world applications only during the last 30 years due to advances in the computing power required in the initial algorithm development and validation. However, once created, the algorithms can be embedded on low-cost digital signal processing chips. Although applying Bayesian inference to real-world situations requires an experienced user, the benefits that this technique offers far outweigh the initial investment.
Bayesian and frequentist methods differ in how the correspondence is constructed between mathematical objects and real-world ideas. The frequentist approach regards probabilities as measures of frequency, while the Bayesian approach regards them as degrees of belief.
In Bayesian statistics, what is known before collecting the data is the prior information, and what is known after collecting the data is the posterior information. In mathematical terms, the Bayesian technique calculates P(θ|D), the posterior probability distribution of the unknown variable θ given the data D (which is what the user actually wants to know but requires the prior probability distribution of θ). In comparison, the frequentist analysis calculates P(D ∈C|H0), the probability that the data fall in some critical region given some null hypothesis H0 about θ.
The difference between these two is pivotal. While the latter probability can be calculated without knowing the prior distribution of θ, the result is not the answer that a user actually needs. It is similar to answering a different question than what is asked in an exam because of a lack of knowledge on how to answer the original question.
Nonstatisticians often want to ask such questions as, “Does this set of samples come from a normally, also referred to as a Gaussianly, distributed variable?” In order to give meaning to such a question, a prior distribution on the distributions considered possible for the variable is essential. For example, if someone asks, “Is Martin 1.5 m tall?,” the answer depends on whether the word exactly is part of the question. If it is, then the answer is no with 100% probability. But if approximately is part of the question, then the answer depends on the exact meaning of approximately.
The same applies to questions about distributions. Very few experimentally observable variables are exactly Gaussianly distributed, even if they are approximately so. But in this case, the whole question turns on what exactly is meant by “approximately,” and without that, the question is meaningless. Unfortunately, specifying what is meant by approximately in this context is much more difficult than when measuring somebody’s height.
Figure 1. (click to enlarge) The prior distribution (left), which represents views on the probability of an ordinary coin coming down heads, and the posterior distribution given 7 out of 8 heads.
Meanwhile, the frequentist defines an unbiased estimator to be one such that the expectation of the estimator is always equal to the true value. Consequently, the frequentist maximum likelihood estimate, which is unbiased in the technical sense of the frequentist’s definition, comes in at 0.875. But this is unreasonable as an estimate of the probability of a coin coming down heads or tails, because it is biased by ignoring the prior information about coins. The combined effect of these factors is that users are not encouraged to think clearly about the problems they face. They get answers to the wrong questions, which are biased by ignoring information other than the data and concentrating on local areas of high probability density rather than the behavior of the whole probability mass.
The following are two more complex examples that illustrate the types of signal processing problems that could be encountered in a diagnostic instrument. These examples discuss how the choice between Bayesian and frequentist methods could influence diagnostic instrument designers. Depending on the application, using Bayesian inference may make the results more accurate, indicate precisely what uncertainties there are, or make the project possible at all rather than impossible. (There are many different frequentist methods. This article only mentions a subset because of space constraints.)
Fluorescence Lifetime Measurements
Extracting quantities of fluors of different lifetimes present in a mixture provides an example of how Bayesian techniques can generate better results than traditional frequentist methods. Fluorescence measurements are used in a variety of IVDs, from lateral-flow tests to high-throughput screening. This example will consider the situation of two fluors of known lifetimes present in unknown amounts.
The traditional method for solving this problem is time-correlated single- photon counting (TCSPC). A sample is illuminated with pulses of laser light that are deliberately so dim that only one fluorescent photon per hundred excitation pulses is expected to come back. This is an essential feature of this approach. Even if the laser could be made brighter, this approach would then fail: since only the first photon coming back after each excitation is recorded, the arrival times available would be biased to be earlier than expected.
After each excitation, the arrival time of only the first of any photons received is recorded. After many tens of thousands of excitation cycles, a histogram of the arrival times of the fluorescent photons seen is made, and least-squares curve fitting of a family of exponential curves is done. This method amounts to using maximum-likelihood estimation under the assumption that the errors are due to Gaussian noise of constant amplitude.
In contrast, the Bayesian method of solving this problem sets the laser to its full brightness (which we will assume is sufficient to get back on average one photon per excitation, but which will give even better results if a greater number of photons is received), records the arrival times not just of the first but of any and all photons received, and applies Bayesian inference to the resulting data.
In order to evaluate the two algorithms, the figures compare the Bayesian algorithm after n excitations and the traditional frequentist algorithm after 100n excitations, so that the average total number of photons emitted is the same for the two algorithms (although the collection time for the Bayesian algorithm is 100 times shorter). The Bayesian solution is shown by the background color plotting of the posterior, the correct result by an x, and the maximum likelihood point by a plus sign.
Figure 2. (click to enlarge) Results after 10 excitations for the Bayesian algorithm and 1000 excitations for the traditional method. The truth is shown by an X in a circle, the Bayesian answer is the colored area, while the traditional method’s answer is shown by a + in a circle.
Figure 3. (click to enlarge) Results after 10,000 excitations for the Bayesian algorithm and
1 million excitations for the traditional method. The truth is shown by an X in a circle, the Bayesian answer is the colored area, while the traditional method’s answer is shown by a + in a circle well to the left of the true point.
So in this example, the Bayesian method finds the true answer (by defining a range in which the truth lies) whereas the frequentist method gives the wrong answer and gives no indication that this is what it is doing. This has serious implications for any diagnostic device.
Despite having 100 times fewer excitations but the same overall average number of photons, several reasons explain why the Bayesian method produces better results than the traditional method. The reasons are that the Bayesian approach avoids the following problems:
- Biasing the photon arrival times by rejecting all but the first in any excitation cycle.
- Assuming Gaussianity inherent in least-squares curve fitting.
- Looking for the maximum of anything; rather, it returns information on the whole posterior probability mass.
Bayesian inference has enabled use of 100 times fewer excitations and nonetheless achieved a result that is genuinely unbiased compared with the traditional TCSPC method. The Bayesian method has also provided a precise view of the uncertainty remaining in a solution for any number of excitations, even as few as 10. Relating this to an diagnostic instrument, the Bayesian posterior distribution will encompass the correct results, while processing data with the frequentist method will lead to inaccurate results.
Automated Multivariate Diagnostics
Electronic noses illustrate the signal processing needed for nonspecific multivariate assay tests. An electronic nose is an array of nonspecific sensors of volatile compounds. While each sensor in the array can respond to a wide range of different compounds by giving an electrical signal, each sensor’s pattern of response to various compounds is different. The task for the signal processing block is to identify patterns of responses that can be used for diagnostic purposes.
A similar signal-processing philosophy can be applied to multigene tests that a number of different diagnostics companies are currently developing as cancer prognostics.1 This concept of an electronic nose will be used to compare the quality of results obtained with Bayesian and frequentist methods.
The developers of any statistical signal processing algorithm require an initial set of data, or training data, to determine the algorithm parameters. In this electronic nose example, we will suppose the nose distinguishes two groups: sick and well. In order to present and compare the results graphically, the number of sensors is limited to two (a value of 32 might be more typical).
Figure 4. (click to enlarge) The best possible classification certainty using information only available to the person who synthesized the data, and the 800 training data points.
Figure 5. (click to enlarge) The result of a Bayesian algorithm and the previously unseen points to be classified.
Figure 6. (click to enlarge) The result of applying maximum-likelihood model fitting.
Figure 7. (click to enlarge) The result of applying linear discriminant analysis (LDA).
The magnitude of the differences between the Bayesian solution and the other frequentist techniques can only be fully appreciated in more dimensions (i.e., with more component sensors in the electronic nose). This is analogous to analyzing a larger number of genes when looking for a gene signature as a method for diagnosing cancer.
Figure 8. (click to enlarge) Performance of Bayesian and various frequentist algorithms in 32 dimensions. Fraction of points wrongly classified of an unseen set of 200,000 points. MaxLike = maximum likelihood; LDA = linear discriminant analysis; NN = neural network; LR = logistic regression; PCA = principal components analysis.
Figure 8 also demonstrates how Bayesian inference has a 100-fold-lower error rate than other statistical analysis methods. In terms of a diagnostic instrument, implementing a Bayesian algorithm would better enable an electronic nose to make a correct diagnosis, or a gene test to better indicate better a predisposition to cancer.
Frequentist methods have been the predominant statistical analysis for the larger part of the last century, to the point where their adherents consider them to be the best way to analyze data and often defend them religiously. However, this article illustrates how Bayesian inference provides more accurate diagnostic classification, particularly with complex and noisy data sets such as those found when developing diagnostic products. Another benefit of Bayesian approaches is that this statistical technique sometimes allows cost savings to be made with the sensor hardware. For example, savings of 100-fold in memory usage and 10-fold savings in central processing unit cycles needed per second, along with a large improvement in performance, have been obtained.
In addition to automated multivariate diagnostics and fluorescence applications, Bayesian techniques can be applied to other diagnostic product categories that produce large-scale data, including microarrays, flow cytometry, proteomics, and time-of-flight spectroscopy.
Another important feature of Bayesian techniques is that the user is informed about the maximum amount of information that can be extracted from the available data. This means that when conducting the analysis, the user knows how good the results are. But with frequentist methods, the user is left wondering how much better the results could be. This advantage of Bayesian inference has not been discussed in this article due to space constraints but it is a significant advantage for device developers.
However, there are many situations in which Bayesian techniques are not needed to provide adequate performance, or where an approximation to Bayesian inference will provide adequate performance. Consequently, when optimal performance is not necessary, Bayesian inference may or may not be the technique of choice. In addition, there is no doubt that because most people have been taught frequentist statistical methods at school, more effort is required to switch over and adopt the simpler Bayesian mind-set.
Nonetheless, Bayesian inference is the generally applicable technique for extracting the most possible data from difficult, messy data sets. This article has shown why further acceptance of Bayesian methodologies could lead to significant improvements in diagnostic devices.
Roger F. Sewell, DM, is a senior consultant at
Cambridge Consultants Ltd. (Cambridge, UK). He can be reached at rfs@
Amanda J. Fuller, PhD,
is group leader of the diagnostic products group at Cambridge Consultants Ltd. (Cambridge, UK). She can be reached at amanda.fuller@
1. S McKee, “FDA Approves First IVDMIA,” IVD Technology 13, no. 3 (2007): 15.