Skip to : [Content] [Navigation]

 

Originally published May, 1998

Design considerations for IVD clinical trials:

Time-staggered multiple end points and related issues

A trial with time-staggered multiple end points can incorporate a product's entire strategic plan into a single study protocol—and can be more cost-effective than conducting multiple trials.

Cheryl L. Hayden and Michael L. Feldstein

An IVD clinical trial represents a major investment for the sponsoring company; consequently, it is important to maximize the utility of the trial. One way to accomplish this is to design the trial protocol to include several time-staggered end points that may be addressed simultaneously. This can be done if the objectives of the trial are stated in advance, the end points are well-defined, and the sample size is calculated with all objectives and end points in mind.

A study with multiple end points offers several advantages that can justify its increased complexity and potentially greater costs. Multiple end points can be associated with multiple objectives—or clinical claims—thereby giving sponsors greater flexibility in preparing regulatory submissions and possibly increasing the number of publishable reports. Such a design is especially well suited to the study of an IVD for a chronic disease process, such as malignancy or autoimmune disease, where data collected over an extended period can demonstrate the ongoing utility of the assay. It is also useful for new biomarkers whose behavior in relation to various disease processes has been the subject of few articles.

A clinical trial that addresses multiple objectives can enable the sponsor to collect data on one objective early in the trial and use them as the basis for a regulatory submission, such as a 510(k) or premarket approval (PMA) application, while additional data on other objectives are still being collected and analyzed. Objectives met with data collected later in the trial may facilitate an additional 510(k) submission or an amendment to the original PMA. Once FDA clearance or approval for marketing is obtained for the first claim, the product may be marketed for that claim while additional claims are being pursued.

The advantage of pursuing additional claims, of course, is expansion of the available market. A biomarker used to diagnose a chronic disease, for instance, would normally be assayed only once or twice for each patient, at the time of diagnosis. However, if the biomarker were shown to indicate a change in the disease process (e.g., from remission to active disease), then it would be assayed repeatedly on each patient, greatly expanding the available market for the product.

Designing a trial with multiple end points requires a greater concentration of energies during the planning stage, development of a more elaborate protocol and case report form, and more time and effort in the analysis and reporting of results. Depending on the initial design of the trial, such a study may also require sponsors to follow subjects for a longer period. Potential benefits of such studies include an early reportable outcome that can translate into an early product launch, confirmation of multiple clinical applications that can translate into a larger potential market for the product, and a more comprehensive understanding of the product that can lead to additional uses. Planning such a trial affords sponsors an opportunity to incorporate a product's entire strategic plan into a single study protocol that can be more cost-effective than conducting multiple trials.

Time-Staggered Multiple End Points

A trial that uses time-staggered multiple end points is one in which each of several objectives has a time-ordered end point, in such a manner that study results collected at each end point can be analyzed as though they were from separate studies. This can be done by designing the trial to include both short- and long-term end points.

For example, a biomarker that is elevated in the presence of a disease process might initially be used to differentiate patients with that disease from patients with another condition exhibiting similar symptoms (a short-term end point that can be analyzed within a couple of months). Following treatment, the same marker might be used to monitor patients for recurrence of the disease (a long-term end point that could require a year or more to complete). A clinical trial to determine whether the same biomarker does indeed provide information for both end points would require serial collection of samples from patients at different times: at presentation with symptoms, after diagnosis, and periodically following treatment. An analysis of the biomarker's utility for differential diagnosis could be performed as soon as all the diagnostic information about the patients is available, even though samples to measure the utility of the marker for monitoring purposes are still being collected.

In a time-staggered multiple end point trial all of the outcomes must relate to the same disease process, so that the entire patient population under study contributes to all of the end points. For this reason, such a trial would not be useful for studying an assay based on an analyte that is elevated in several different acute processes, but does not remain elevated long enough to track recovery. By contrast, such a trial is especially suited to the study of a biomarker related to a specific malignant disease, where the assay may have utility in differential diagnosis, prognosis, monitoring, and measuring response to treatment. To construct such a trial properly, it is necessary to understand the biology of the analyte being measured, as well as the disease process. From a statistical perspective, data generated from a patient sample may be used for more than one end point, but not repeatedly over time. Thus the multiple end points will be clinically correlated, but each is analyzed only once.

Sample Trial

A purely fictitious clinical trial will be used to illustrate these points. Suppose a biomarker has been discovered that is present in the circulation of about 70% of patients known to have active non-Hodgkins lymphoma, but whose presence cannot be demonstrated in the circulation of normal, healthy adults. Laboratory work has shown this analyte to be associated with HTLV I in the following manner: the analyte is secreted into culture supernatant of transformed B cells infected with HTLV I, but is not secreted into the supernatant of cultured B cells not infected with the virus. Thus, there is evidence that the biomarker is associated with the transformation process that takes place when HTLV I infects B cells.

Suppose also that a preliminary clinical evaluation has shown that some people with known non-Hodgkins lymphoma have this biomarker in circulation. We can now design a trial to answer the following clinical questions:

  • Are there subsets of non-Hodgkins lymphoma in which the biomarker is not present (e.g., certain histologic types)?

  • How early in the disease process of non-Hodgkins lymphoma does the biomarker appear in detectable amounts in the serum? (A corollary to this question is the following: Is the biomarker detectable prior to clinical evidence of the disease?)

  • Does the circulating level of the biomarker reflect response to treatment?

  • Does a rising level of the biomarker in the serum indicate recurrence of the disease?

Provided that the patient population includes adequate numbers of persons with various stages and grades of disease and with different risk factors (in this example, the range of disease characteristics would include all stages, grades, and histologies), each of these questions can be addressed by collecting samples from patients with non-Hodgkins lymphoma at the time of their diagnosis and periodically during treatment and follow-up. Figure 1 illustrates the times of sample collection during the disease process that would allow analysis of the data to address these questions.



Figure 1. Model timeline correlating disease process and times for the collection of patient samples to be used in testing a biomarker assay.

The first and second questions can be answered using data derived from the first set of samples, which are collected prior to any treatment and consequently while patients still have active disease that can be characterized by the oncologist and pathologist. The corollary can be answered only if some patients were identified as having lymphoma incidental to some other disease process or as part of a screening program. Patients that presented with clinical symptoms of the disease would be automatically excluded from consideration for this corollary.

The third question, whether the analyte accurately indicates the patient's response to therapy, can be answered using samples collected during the period in which therapy is being administered. The fourth question, whether the analyte gives useful information about recurrent disease, is answered by evaluating all of the patient samples after treatment.

A study such as this could also be used to address other clinical questions, but those posed here are enough to illustrate the principle of time-staggered multiple end points and the independence of the statistical analyses.

In this example, data from the pretreatment sample are sufficient to address the first two clinical questions because they have short-term end points (i.e., all the required information is collected at the time of diagnosis and disease characterization). Analysis of these data can be performed while samples for the third clinical question are still being collected. The response-to-treatment end point requires the sponsor to follow patients for another six months to determine their response to chemotherapy or radiotherapy. Once that information has been collected for all patients, it can be analyzed while samples are being collected on a regular basis (e.g., quarterly for 48 months) to determine whether the biomarker has utility for indicating recurrence of disease.

Several issues common to all clinical trial designs are especially important with time-staggered multiple end point trials. To make it feasible to analyze data from multiple end points, it is essential for sponsors to select carefully worded objectives, measurable end points, and accurate gold standards. If these elements are not properly delineated in the protocol, it is likely that the clinical data collected will not be adequate for the desired analyses. Each of these related issues is discussed below.

Study Objective

As with all clinical trials, it is extremely important to word study objectives (or hypotheses) very carefully, so that it is clear to everyone reading the protocol what is being measured and how it will be evaluated. This is especially important when time-staggered multiple end points are being used, because the objectives are linked in some fashion. Consequently, clearly worded objectives are necessary to provide a rationale for the study design and sampling scheme.

Consider the hypothetical trial described in the previous section. The following is an example of a poorly worded objective for that trial.

Objective: To evaluate biomarker X as an aid in the management of non-Hodgkins lymphoma.

From this the reader cannot determine what biomarker X measures or to what it is being compared. The following formulation might be a better wording of this objective.

Objective: To determine the association of serum levels of biomarker X with stage of disease in patients diagnosed with non-Hodgkins lymphoma.

This objective addresses the second clinical question posed in the earlier discussion. By reading this objective the reader knows that serum samples are being collected to be assayed for biomarker X, and the levels of biomarker X in the circulation will be analyzed with stage of disease as the gold standard for comparison.

Even this clearly stated objective still requires further definition. The method of determining association should be described in the statistical section of the protocol. In this example, for instance, one could calculate the mean level of biomarker X for each stage of disease and perform an analysis of variance to determine whether the mean levels are significantly different. Similarly, the protocol must define the method to be used for determining stage of disease. This definition must be clearly stated using well-established and accepted criteria that can be measured objectively. This leads to the second related issue.

Measurable End Points

A problem found in many clinical trials is the use of "fuzzy" end points. It is most important that the measure used for the end points be as objective and reproducible as possible. This becomes a critical issue in a study with multiple end points, because the appropriate times for data analysis cannot be determined if distinct end points are not specified. Without specific, predetermined end points, the analysis of study results can degenerate into a "data-dredging" operation that may leave the conclusions open to the criticism of analytical bias.

If the end point measure varies from patient to patient, then the data will be uninterpretable. In the hypothetical trial described above, for instance, response to treatment should not be measured using the physician's impression or the patient's assertion that he or she "feels better," but should require some objective measure such as decrease in tumor size as seen on imaging. Likewise, recurrence should not be measured using the physician's impression, but should require pathologic diagnosis of malignancy at a site where malignancy was not previously seen or an increase in tumor size as seen on imaging.

In situations where an objective end point is truly unavailable, sponsors must make every effort to minimize bias in the outcome measure. For example, if the end point depends on evaluation of the patient, that evaluation should be performed by an independent observer, such as a physician who was not involved in diagnosis or treatment of the patient. Use of standard criteria to evaluate patient status can also help to make evaluations more objective.

Accurate Gold Standard

A related problem commonly seen in clinical trials is the use of an inaccurate gold standard—the outcome measure that is used to judge all other measures (see box, page 37). And again, this factor is especially important in a study with time-staggered multiple end points because those end points are generally linked. Consequently, if an inaccurate gold standard is used in analysis of the first end point, erroneous results will most likely affect the outcome for subsequent ones.

It may be difficult to select a gold standard for a trial. In recent years, with advances in technology, many of the traditional assay methodologies have become outmoded. For example, the traditional method of testing for the presence of a specific virus is growth of the virus in tissue culture. If a virus is difficult to grow in vitro this may be a very insensitive assay method. New techniques involving DNA hybridization, on the other hand, may be exquisitely sensitive.

The problem arises in understanding the clinical relevance of the presence of viral DNA in a sample. The physician knows that growth of the virus in tissue culture indicates the presence of viable organisms in the sample and, by inference, in the patient. He or she also knows that lack of growth of the virus in tissue culture does not necessarily indicate that there are no viable organisms in the patient. Presence of viral DNA in a sample, however, does not necessarily indicate the presence of viable organisms in the patient; consequently the physician may have difficulty interpreting a positive result. If a DNA hybridization assay is compared to a tissue culture method (the gold standard), it will probably be shown to have many false positives. However, those discrepancies may really be due to false negatives among the tissue culture results. Determining "truth" in this case may not be easy.

When new technology makes traditional gold standard methods outdated, the question arises of who should be responsible for validating a new gold standard. Such validation may involve extensive clinical trials to determine the clinical utility of the new, more sensitive method. An IVD maker may be reluctant to take on such a task, preferring to use a flawed gold standard and conduct a less-extensive clinical trial.

A definitive discussion of the issue of adequate gold standards is beyond the scope of this article. However, it is a critical issue that must be considered when designing a trial, and the advantages and disadvantages of the gold standards being considered must be carefully weighed.

Sample Banks

Another issue that often arises in IVD trials is the use of banked samples. It may be possible to evaluate some or all of the end points in a trial using banked samples if the samples were collected and handled in an appropriate manner, and if adequate clinical data are available to allow evaluation of clinical utility. For these samples to yield valid results, it is crucial to determine the stability of the analyte in the sample under the conditions of storage.

If no sample bank exists for the clinical trial being designed, it may be advantageous to establish one. Again, this requires knowledge of the analyte's stability under appropriate sample handling and storage conditions. A well-characterized sample bank (i.e., a sample bank with associated clinical data) will permit reassay of samples at a later date, allowing rapid validation of a new assay method.


The pitfalls of gold standards

A biomarker for malignancy is the subject of a clinical trial to determine its utility in monitoring patients for recurrence. The gold standard against which the marker should be compared—the most accurate measure available—is pathologic confirmation that the malignancy has recurred.

Nevertheless, clinicians seeking to detect the recurrence of malignancy often rely on imaging technologies that seem to show the presence of suspicious lesions (i.e., those that appear malignant). As a measure of recurrence, this method is significantly less accurate than pathologic confirmation. In the context of a clinical trial, use of such a standard can lead to erroneous conclusions about the utility of the marker. If the standard in use is actually quite inaccurate, analysis of test results for the biomarker could also be very misleading. Consider the following truthful accuracies (not results of a study).

Gold standard: 60% accurate.

New biomarker: 80% accurate.

A trial to evaluate the new biomarker using this gold standard would yield an estimate of accuracy below 50%, when in reality it is 80% accurate. This could lead to abandoning the biomarker when it might, in fact, be very useful.

This situation can occur for a variety of reasons. For example, the person designing the trial may depend on participating physicians for information about the accuracy of diagnostic procedures. In one such instance, a trial of a biomarker for bladder cancer was designed with an end point for tumor recurrence that depended on cystoscopic examination—a method commonly used by urologists to detect recurrent tumors.1 When the data were analyzed, however, it was found that 15—20% of the tumors detected visually were benign on pathologic examination. Clearly, in this trial, the gold standard should have been pathologic evaluation of the tissue—not cystoscopic examination.

Careful selection of gold standards is especially important for trials that employ multiple end points. In the bladder cancer trial mentioned above, consider what might have occurred if another end point depended on following patients that had been identified as having recurrent disease. Since patients who did not have pathologic confirmation of recurrence could easily be misclassified, analysis of any results gathered at the next end point would be statistically meaningless. In short, use of the wrong standard can jeopardize the accuracy of any study's conclusions, and multiple end point trials are especially susceptible.

Reference

1. Soloway MS, Briggman JV, Carpinito GA, et al., "Use of a New Tumor Marker, Urinary NMP22, in the Detection of Occult or Rapidly Recurring Transitional Cell Carcinoma of the Urinary Tract Following Surgical Treatment," J Urol, 156:363—367, 1996.


Discussion

Although trials with time-staggered multiple end points are more complicated to design, several benefits can be realized by taking this approach. If a clinical trial is designed with very little preliminary data, as is often the case, it is unlikely the trialist will know whether its objective might yield a positive outcome. But a trial that is designed with several objectives increases the likelihood that at least one of those objectives will yield a positive outcome. If the trial demonstrates a positive outcome for more than one objective, multiple claims can then be submitted to FDA, thereby increasing the market for the diagnostic. Testing a diagnostic for multiple uses can also support other activities, such as writing papers for submission to peer-reviewed journals, preparing abstracts for presentations at scientific meetings, and developing cost-analysis data for the marketing department.

It may seem obvious that it is necessary to coordinate the end points being measured with the clinical claims to be made. However, lack of such coordination is a common error in trial design. It is crucial during the design stage to ensure that the samples and clinical data being collected will support claims that are clinically useful.

To illustrate, consider a hypothetical prognostic marker for prostate cancer. A marker that differentiates prostate cancer with metastatic potential from prostate cancer that will not metastasize would have real clinical utility. But a study designed to collect samples from patients after metastasis has occurred and to compare their biomarker levels with those from patients with no metastatic disease would not support a prognostic claim. The only claim that could be supported by such a study is that the marker identifies patients with metastatic disease. To claim prognostic utility, samples must be collected prior to the detection of metastatic disease, with the patients being followed for some period to record which ones develop metastatic disease.

Conclusion

Time-staggered multiple end points can be designed into clinical trials for IVDs. Such a design allows sponsors flexibility in choosing objectives so that the results can be used in a variety of settings, thus maximizing the utility of the trial. To do so, the following factors must be kept in mind:

  • Objectives must be clearly stated in the protocol.

  • All objectives must be developed in advance.

  • Sample size must be determined using all objectives.

  • End points must be measurable and consistent.

  • The gold standard that the sponsor uses must be accurate.

Following these rules should result in a well-constructed clinical trial that will yield data that can be analyzed and interpreted to meet all of the sponsor's objectives.

Cheryl L. Hayden is senior staff consultant in the clinical services department and Michael L. Feldstein is director of clinical services at Medical Device Consultants, Inc. (North Attleboro, MA). Photo by Nicholas Rigg/FPG


Copyright ©1998 IVD Technology Magazine