Skip to : [Content] [Navigation]
 

IVD Technology Magazine
IVDT Article Index

Measurement traceability in clinical chemical analyses

The concept of measurement traceability that has been established in general chemical metrology is now also being introduced to the field of clinical chemical analysis. Traceability provides probably the most important strategy for achieving standardization in laboratory medicine, that is, the attainment of comparable measurement results regardless of the method or measurement procedure (test kit) used, or of the laboratory in which analyses are performed.

This article provides an overview of the development of the traceability concept, concluding with a historical and analytical discussion of the problems that once hampered the performance quality of clinical diagnostic tests and of the promise that the global adoption of reference systems grounded in credentialed traceability holds for dramatically improved measurement accuracy and consistency.

The Traceability Chain

According to the Vocabulary in Metrology and the Guide to the Expression of Uncertainty in Measurement, measurement traceability is the "property of the result of a measurement or the value of a standard whereby it can be related to stated references, usually national or international standards, through an unbroken chain of comparisons all having stated uncertainties."1,2

Traceability of a value attributed to a routine sample, a calibrator, or a control material is established by a series of comparative measurements using measurement procedures and reference materials in a chain of decreasing hierarchical order, as shown in Figure 2. Since each link in the traceability chain contributes to the uncertainty of the result, it is advisable to omit as many steps as possible. In metrological terms it would be ideal to omit all in-between steps of the traceability chain and to measure the routine sample directly by use of a primary reference procedure. This, of course, is not feasible.

The complete traceability chain is valid only for those measurable quantities that can have a value expressed in SI units (units traceable to the Système International d'Unités). When primary or secondary calibrators are not available, the traceability chain for many measurands in laboratory medicine ends at a lower level, for example, at the manufacturer's selected measurement procedure. In a case where a manufacturer detects a new diagnostic marker and defines the measurable quantity by establishing a measurement procedure for the marker, the manufacturer's measurement procedure will form the top of the traceability chain. Nevertheless, even in this simple situation the principles of traceability are applicable.

An inevitable precondition for establishing results traceable to calibrators and control materials is the specificity of the measurement procedures applied. Results of measurement cannot be traceable when the procedure applied partially detects components that are not consistent with the definition of the measurand.

Traceability is not really a new concept for in vitro diagnostics. Many years before traceability was mentioned in connection with general chemical metrology, reference measurement procedures and reference materials had been established in clinical chemistry. Early developments in this field in the United States, particularly those reported by scientists at the National Institute of Standards and Technology (NIST) and other independent scientists, as well as the relevant standards generated by the National Committee for Clinical Laboratory Standards probably had an important influence on the development of the concept of traceability in general chemical metrology. Some basic experimental work toward the development of reference measurement procedures and reference materials had already been undertaken in Europe.

Credentialing

In 1970, long before traceability was a popular concept, the technique of isotope dilution mass spectrometry (IDMS) was developed in a clinical chemical reference laboratory and applied as a reference procedure for the measurement of estrogens in human body fluids. This technique has ever since provided one of the most powerful tools for establishing reference method values for many substrates and metabolites in calibrators, controls, and reference materials.

The process of credentialing the traceability concept, its implementation, and its acceptance is not only a regional (European) or national task but also a global one. Traceability is a goal that concerns all members of the scientific community involved in the field of clinical chemical analyses, including:

  • Legislative bodies and government agencies that issue regulations or directives concerning measurements in laboratory medicine.
  • National metrology institutes responsible for dissemination of the SI units and for establishing correctness of measurement according to national legislation.
  • International organizations that issue reference materials.
  • International and national standardization organizations.
  • International and national scientific societies.
  • External quality assessment organizations.
  • Reference laboratories.
  • Diagnostic kit manufacturers.
  • Clinical chemical laboratories when applying commercial or homemade diagnostic tests.
  • Physicians basing diagnosis and therapy on laboratory results.

Credentialing traceability in clinical chemistry implies demonstration of the applicability and usefulness of the concept as a basis of credibility. In practice, this calls for the establishment of reference systems consisting of reference measurement procedures, reference materials, and reference laboratories, the latter preferably accredited and organized within a network.

Reference Systems

The introduction of reference systems for clinical chemistry has been proposed for about 30 years now. However, neither the international scientific community nor any national or international body has addressed the question of which agency or authority should be responsible for formal authorization of these reference systems or their component materials, procedures, and laboratories.

For reference materials, it may not be too difficult to solve the problem. Materials that fulfill the requirement for higher-metrological-order standards are now provided by NIST and the Institute for Reference Materials and Measurements (IRMM). Additional useful materials, although usually of lower metrological order, are issued by the World Health Organization. Authorization of these materials is a question of mutual acceptance, which, in view of the "Implementing Arrangement for Cooperation in the Fields of Metrology and Measurement Standards" signed by the directors of NIST and the EU Research Directorate, has almost been achieved.

The situation regarding authorization of reference procedures and reference laboratories is somewhat more complicated, and the question of how the concept of traceability should be implemented arises. There is no simple answer, and probably there is no rule that can be applied to all situations. The new international standards dealing with traceability (prEN ISO 17511) and the requirements for reference laboratories (ISO DIS 15195), developed by CEN TC 140 and ISO TC 212, respectively, can at least give some guidance concerning the credentialing process for reference laboratories and reference procedures.3,4

In fact, the strategy for establishing reference systems depends on the nature of the analyte.

For low-molecular-weight substances–electrolytes, organic substrates, and metabolites like cholesterol, creatinine, or steroid hormones–as well as for many drugs, the only meaningful solution is to aspire to results that are traceable to SI units. With respect to these measurands, where traceability to SI units is achievable, national metrology institutes as legal custodians of units and measurements hold the highest authority (see Figure 1 and Figure 2). The metrology institutes are connected with each other globally via the CCQM key comparisons (named for the Consultative Committee of Amount of Substance), through which they demonstrate, by means of ring trials, their ability to perform measurements of the highest available metrological level, most often by using so-called primary methods. Such key comparisons have been carried out for cholesterol, creatinine, and glucose. A group of European metrology institutes has now started a global initiative to address the problem of traceability in clinical chemical measurements in collaboration with NIST and the metrology institutes in Australia and Japan.

Figure 1. Hierarchy of laboratories.


Figure 2. Calibration hierarchy and traceability to SI in the metrological traceability chain.

In addition to the national metrology institutes there exist a number of highly specialized reference laboratories, most of which are situated in university hospitals or at manufacturers' sites. These laboratories usually have developed their own reference procedures. Some of them have long experience and perform measurements at a high metrological level.

According to the ISO standard on reference laboratories in laboratory medicine (ISO DIS 15195), their competence may be approved by the national metrology institutes, for example, by accreditation in accordance with ISO 17025. The German metrology institute Physikalisch-Technische Bundesanstalt has so far accredited two reference laboratories. One is a laboratory that mainly serves the Reference Institute of Bioanalysis, the proficiency testing organization of the German Society of Clinical Chemistry (DGKC), and also establishes target values for calibrators and control materials of commercial diagnostic kits. The second laboratory belongs to an industrial diagnostic kit manufacturer. Further accreditations will follow in due course.

The competence of reference laboratories with respect to environmental, staff, and management performance may be approved by accreditation, but this still leaves a question often asked: When is a "proposed" reference method a reference method? The beginning of the answer is that the competence of a reference laboratory should be evaluated not only according to the quality of its management as laid down in a quality manual and monitored by regular inspections of the laboratory, but also on the basis of documented reference procedures and–most importantly–on the results of parallel comparative measurements. In this way the accrediting body also approves the measurement procedures and their performance. Accreditation thus is not valid for all measurements produced by the laboratory but only for particular measurands for which agreement of results with those of the laboratory of the accrediting body, and thereby traceability to the SI, has been demonstrated in comparative measurements.

Accreditation of a reference laboratory for a particular measurable quantity carries with it an approval of the measurement procedure, which includes the measurement principle (such as IDMS as a primary method for the measurement of cholesterol), the complete standard operating procedure (SOP), and the uncertainty of results. This may serve as an answer to the question of when a proposed reference method is a reference method.

The concept of traceability has been promoted in Europe during the past 15 years by the organizers of external quality assessment schemes. In the German proficiency testing system in particular, the use of reference measurement procedures for several measurands has been prescribed by legislation since 1988. Consequently, the Reference Institute of Bioanalysis has established reference measurement procedures for electrolytes, metabolites and substrates, enzymes, hormones, and drugs (see Table I). Reference methods for 13 of the 30 analytes listed in the table have been developed in the reference laboratories of the DGKC using the analytical principle of the IDMS primary method. These include creatinine, urea, cholesterol, total glycerol, uric acid, and glucose, as well as the steroid hormones and thyroxine. The reference methods are now applied regularly in setting up target values in the control samples of internal and external quality assessments and in certifying matrix reference materials of the IRMM.

Drugs
Digitoxin Theophylline
Digoxin  
Electrolytes
Calcium Magnesium
Chloride Potassium
Lithium Sodium
Enzymes
Alanine amino transferase Creatine kinase
Anylase Gamma glutamyl transferase
Aspartate amino transferase  
Hormones
Aldosterone Progesterone
Cortisol 17-Hydroxy-progesterone
Estradiol-17ß Testosterone
Estriol Thyroxine
Metabolites and Substrates
Bilirubin Lactate
Cholesterol Total glycerol
Creatinine Urea
Glucose Uric Acid
Total Protein
Table I. Analytes having reference procedures established by the Reference Institute of Bioanalysis.

The system is shared with partners in Portugal, the Czech Republic, and, occasionally, Denmark by means of an exchange of samples for external quality assessment with reference procedure target values.

Traceability and the Improvement of Diagnostic Tests

The introduction of the concept of traceability has improved the performance of diagnostic tests since 1988 in the following ways.

Variability of Results is a Problem. A list of routine method target values for creatinine, uric acid, total cholesterol, and total glycerol in the control material of one manufacturer that was issued before 1988 shows a large scatter of up to 30% among methods and test kits (see Table II). The fact that only one value for creatinine concentration in serum can be the "true" one made this situation particularly untenable. Obviously, any progress toward improving the comparability of analytical results from different laboratories is hindered as long as methods with a known–or even unknown–bias are accepted.

Procedure
Value
Cholesterol
mmol/L
CHOD-iodide
4.02
CHOD-PAP
4.30
CHOD-catalese
4.61
Peridochrom
4.69
Liebermann-Burchard
5.49
Creatinine
µmol/L
Enzymatic/PAP
15
Enzymatic UV system
161
Jaffe without deproteinization (Merck)
168
Jaffe after deproteinization (Boehringer)
177
Jaffe without deproteinization (Boehringer)
189
Triglycerides
mmol/L
Fully enzymatic (Boehringer)
1.15
Fully enzymatic (Merck)
1.34
Fully enzymatic (Roche)
1.30
Enzymatic (Boehringer)
1.36
Uric Acid
µmol/L
Fully enzymatic (Boehringer/Merck)
457
UV–system (Boehringer)
476
UV–system (Merck)
539
Phosphotungstic acid (Goed.)
583
Table II. Procedure-dependent target values in a commercial control serum.


This unsatisfactory situation became apparent in external quality assessment as well, as evidenced by results from a ring trial. Two different samples were distributed in a 1987 DGKC routine ring trial for cholesterol to about 1300 laboratories, and the results obtained were displayed in a Youden diagram (see Figure 3). Each dot in the diagram represents the two results from one laboratory, that for sample A being read from the abscissa and that for sample B from the ordinate. A laboratory whose performance dot falls in the middle of the screen is in full agreement with the target value, which here is the reference method value certified by isotope dilution mass spectrometry.

Figure 3. A Youden diagram of a collaborative survey for cholesterol conducted in 1987. The three broken-line squares show the method-dependent evaluation limits for the Liebermann-Burchard method (upper right), the CHOD-PAP methods (middle), and the CHOD-iodide method (lower left). The solid-line square in the center shows the acceptance limits based on the target of the IDMS reference method value.

Results from the survey clearly show that three different groups of data have been reported, representing three different methods of cholesterol determination. Survey participants reporting relatively high cholesterol results had used the Liebermann-Burchard procedure, which was still in use in 1987. The group obtaining low cholesterol values had applied the cholesterol oxidase iodide (CHOD-iodide) method. The data from laboratories using the CHOD-PAP method are situated in the middle of the screen. Until 1988, ring trial participants' results were evaluated by comparison with the mean of each peer group, according to the different methodological principles used. Differences up to 50% among the peer group target values could be observed for cholesterol measurements. In view of the fact that there can be only one true cholesterol concentration value in a serum, this situation, too, was untenable.

After introducing reference procedure values for cholesterol that are based on IDMS measurements, the divergent peer group target values have now been replaced by reference method values that, in the case depicted, are represented as the exact middle of the screen. The corresponding limits of acceptance are shown as the solid-line square. As a consequence, methods displaying inherent systematic error like the Liebermann-Burchard and the CHOD-iodide disappeared from the market, so that today only methods that are within the limits of acceptance with the reference method values established by IDMS exist.

Twelve years ago there was unacceptably wide scatter in method-dependent target values for many clinical chemical parameters. To improve accuracy in clinical chemistry it was essential to replace these method-dependent values with reference method values.

The measurement of hormone concentrations in human body fluids has proved to be a valuable diagnostic tool in the field of clinical endocrinology. Thyroxine and the various steroids, the most commonly determined hormones, are usually measured by radioimmunoassay or enzyme immunoassay with a fairly high degree of sensitivity. However, a manufacturer's list, issued in 1987, of aldosterone, cortisol, progesterone, and estradiol-17ß target concentrations in a commercial serum pool indicates that, given the same sample and using immunoassay, assigned values varied considerably from one test kit to another (see Table III). For cortisol and aldosterone the range of results was between 100% and 200%, and for progesterone and estradiol-17ß determinations the results differed by a factor of 7. This was probably due to variations in quality among the antibodies and reagents used in the several commercial kits. What could a consensus value mean in such a context? A target value based on a consensus mean or median was of little use in judging test kits that gave such variable results.

Manufacturer
Aldosterone (pmol/L)
Cortisol (nmol/L)
Progesterone
(nmol/L)
Estradiol
(pmol/L)
Abbott
121.9
Amersham
113.1
Baxter Dade DIR
104.8
2.16
396.4
Baxter Dade AG ER
244.1
Baxter Dade AD EXT
196.0
Becton Dickinson
88.0
Bioclone
1.91
Biomèrieux
2.54
539.6
Biotex Premix
99.4
70.6
3.72
759.9
Cambridge Medical
120.8
0.86
CIBA Corning
110.3
Clinical Assays
99.3
Cyberflour Fiagen
88.2
Diagnostic Products
207.2
113.1
3.12
119.3
DuPont Rianen
135.1
Eurodiagnostics  
115.8
Farmos Diagnostic  
99.3
4.67
394.9
Immunchem Cov. Coat  
110.3
5.41
348.7
Leeco  
113.1
2.99
144.2
Mallinckrodt  
88.3
NML RIA  
96.6
NMS Pharmaceuticals  
3.18
205.5
Pantex Immuno Direct  
143.1
Pantex Immuno  
118.6
4.13
190.8
Pantex Immunocoat
132.4
7.00
154.9
Pharmacia Delfia
99.9
790.0
RSL
169.2
4.77
117.4
Sclavo Liso Phase
277.4
126.9
3.82
Serono
112.0
Sibar Elisa
121.3
1.27
Sorin
165.9
68.9
2.86
139.5
Syva Emit
137.9
 
Techland RIA  
4.77
 
Vitek Systems  
110.0
   
Table III. Target concentrations for steroid hormones in a commercial control serum, using immunassay test kits by a variety of IVD manufacturers.

Using method-dependent assigned values for external quality control means having many different target values for the same analyte in the same control serum–a highly impractical and, from a theoretical standpoint, unsatisfactory procedure that generates different results for a substance of known molecular weight and with a defined number of molecules.

Reference Methods are the Solution. It seemed imperative to establish a methodology that would provide the basis for the development of reference methods. Target values for DGKC collaborative surveys for steroid hormones have been determined since 1977 by reference methods, and more recently also for thyroxine.

Recently the author's laboratory had to reply to a complaint from a manufacturer who suspected that the bad performance of its customers in the laboratory's proficiency system surveys for progesterone was due to commutability problems with the quality control materials used in the ring trials. The unsatisfactory performance of the test became noticeable as an increasing bias of the test kit resulted at lower progesterone concentrations (see Figure 4, sample B).

In order to validate the commutability of its control materials, the laboratory had to perform split-sample measurements with patient samples, using the test kit in parallel with the IDMS reference procedure for progesterone. A good correlation between the test kit and the IDMS reference procedure could be observed at first (see Figure 5[a]). However, the difference plot of the same data revealed for both the patient sera and the ring trial results a considerable bias in relation to the reference procedure at low progesterone concentrations (see Figure 5[b]). The reason for the bad performance of the test was obviously a lack of specificity rather than a lack of control-material commutability. At even lower progesterone concentrations the bias increased to 1000%. Unfortunately, the kit manufacturer did not issue any lower limit of determination for its measurement procedure.

Here is another illustration of the utility of having target values based on reference methods. In the early years of performing external quality control, the accuracy of unconjugated estriol in serum proved to be astonishingly high. This changed dramatically toward the end of 1981 (see Figure 6). Especially when the control samples contained conjugated estriol, the ranges of collaborative survey participants' results were significantly higher than the mass spectrometric target values.

Figure 6. Results of collaborative surveys for estriol. Columns indicate the range of distribution between the 16th and 84th percentiles of each survey.

As it happened, in late 1981 a kit manufacturer who dominated the estriol-determination market in Germany started using a new antibody. This obviously gave rise to cross-reactions with the conjugated steroid. Meanwhile, a small group of survey participants who used their own laboratory methods to determine estriol continued to produce results that accorded with mass spectrometric values. It proved possible to convince the kit manufacturer that the new state of affairs needed correcting and, mainly as a result of that, results have improved greatly since 1985. However, it must be assumed that in the period from 1981 to 1984 test kits failed to measure estriol not only in control samples but also in patient samples. Estriol determinations are primarily used to monitor fetal well-being in the last months of pregnancy. Since not only estriol but also estriol conjugates are elevated during this period, nonconjugated estriol was probably overestimated owing to the test kit's lack of specificity not only in control samples but also in patient samples.

For non-SI-traceable quantities the strategy for introducing traceability has to be different. This concerns a large number of analytes for which no defined molecular structure can be assigned, such as many enzymes, proteohormones, tumor markers, and cardiac markers. Before it is possible to establish reference systems (reference procedures, materials, and laboratories), the measurand under consideration must first be defined. A global consensus on the definition should be achieved whenever possible. Consequently, definition of the measurand along with the establishment of relevant reference systems constitutes the objective of several working groups and committees of the Scientific Division of the International Federation of Clinical Chemistry (IFCC).

In many instances, a selected and agreed upon reference measurement procedure forms the basis of the definition of the measurand and thereby represents the top of the hierarchical traceability chain. This is particularly true in the establishment of reference systems for the catalytic concentrations of enzyme activities. In 1999, members of an IFCC working group along with some enzyme reference laboratories decided to establish new 37ºC measurement procedures as IFCC reference methods on the basis of the existing 30º IFCC methods and to certify enzyme reference materials for ALAT, GGT, CK, and LD in collaboration with the IRMM. The project was conducted in three steps:

  • Primary procedures for the measurement of catalytic activities were discussed and chosen as IFCC reference methods. Some experimental work was necessary in order to achieve optimized conditions for the measurement protocol. The members of the working group agreed upon SOPs, which include control and reporting of traceability of all individual procedural steps, for example, those taken to obtain mass, volume, temperature, photometric wavelength, and absorbance measurements.
  • The performance of laboratories applying the SOPs was demonstrated in feasibility studies by analyzing several commercial control materials. Depending on the enzyme, 10–12 laboratories from hospitals and diagnostic kit manufacturers were involved in the project.
  • In the certification campaign, the participating laboratories were asked to perform the measurements on at least three different occasions. A material from the feasibility study was included in the analytical series for internal quality assessment.

As shown in Figure 7, the certification campaign for four different BCR enzyme reference materials (ALAT, GGT, CK, and LD) demonstrates that the 95% confidence level of the laboratory results is less than 2.5% for all enzymes under investigation, which shows the excellent neurological performance of participating laboratories stretching from the Far East (Japan) to the Far West (California), and that the SOPs developed in the course of the study can be used as reference points for the definition of the measurands at the top of the traceability chain. The procedures will now be published as IFCC reference methods.

Figure 7. Results of the certification of BCR enzyme reference materials for ALAT (a), GGT (b), CK (c), and LD (d). The bar graphs show the 95% confidence intervals obtained in the certification experiment in 1999 (top) in comparison to those of former certification campaigns using different methods (temperatures). The lower four bar graphs show the 95% confidence intervals obtained from a feasibility study in 1998, before the certification campaign, using different commercial calibrator preparations.

Conclusion

So far, reference systems for the measurement of catalytic activity concentrations for four different enzymes have been established successfully and can now be used for assigning traceable values to calibrators and control materials. The IFCC enzyme project, which has been conducted in conjunction with the IRMM, could be regarded as a model for the development of reference systems in other fields of interest.

It can be stated in summary that for SI-traceable measurands the credentialing process has been successful to some extent, although full implementation of the traceability concept on a global basis will require considerable further effort. For non-SI-traceable quantities, the predominant objective must be agreement on the definition of these quantities. Once that is achieved, reference systems comprising reference procedures, materials, and laboratories can be established.

References

1. International Vocabulary of Basic and General Terms in Metrology, 2nd ed. (Geneva: International Organization for Standardization, 1993).

2. Guide to the Expression of Uncertainty in Measurement (Geneva: International Organization for Standardization, 1995).

3. In Vitro Diagnostic Medical Devices–Measurement of Quantities in Samples of Biological Origin–Metrological Traceability of Values Assigned to Calibrators and Control Materials, prEN ISO 17511 (Geneva: International Organization for Standardization, 1999).

4. Requirements for Reference Measurement Laboratories in Laboratory Medicine, ISO DIS 15195 (Geneva: International Organization for Standardization, 1999).

Lothar Siekmann, PhD, is a professor of clinical chemistry at the University of Bonn (Germany). This article originated in a presentation at the Workshop on Measurement Traceability for Clinical Laboratory Testing and In Vitro Diagnostic Test Systems sponsored by the National Institute of Standards and Technology (Gaithersburg, MD), November 2–3, 2000.

IVDT January/February table of contents | IVDT home page


Copyright ©2001 IVD Technology