Advertisement
Viewpoint Free access | 10.1172/JCI146392
1Department of Epidemiology and Biostatistics, and Pediatrics and Health Development, Michigan State University, College of Human Medicine, East Lansing, Michigan, USA.
2Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, Minnesota, USA.
Address correspondence to: Nigel Paneth, Departments of Epidemiology and Biostatistics and Pediatrics and Human Development, Michigan State University, College of Human Medicine, 909 Fee Road, Room 218, East Lansing, Michigan 48824, USA. Phone: 517.884.3961; Email: paneth@epi.msu.edu.
Find articles by Paneth, N. in: JCI | PubMed | Google Scholar
1Department of Epidemiology and Biostatistics, and Pediatrics and Health Development, Michigan State University, College of Human Medicine, East Lansing, Michigan, USA.
2Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, Minnesota, USA.
Address correspondence to: Nigel Paneth, Departments of Epidemiology and Biostatistics and Pediatrics and Human Development, Michigan State University, College of Human Medicine, 909 Fee Road, Room 218, East Lansing, Michigan 48824, USA. Phone: 517.884.3961; Email: paneth@epi.msu.edu.
Find articles by Joyner, M. in: JCI | PubMed | Google Scholar
Published December 3, 2020 - More info
Randomized controlled trials (RCTs) are universally accepted in the world of medicine as the preferred design for the analysis of health-related interventions, be they preventive or therapeutic. It may seem odd to realize that this now-axiomatic approach has been standard for only some 70 years of the more than two-millennia history of medicine. While sporadic proto-randomized trials had been published earlier, the RCT became most firmly established as the standard design for treatment evaluation in 1948, when the British Medical Research Council showed in a carefully randomized multi-hospital trial that streptomycin had substantial benefits for patients with pulmonary tuberculosis compared with the then-standard treatment of bed rest (1).
The randomized trial become so well established that many often feel that absent one RCT or, ideally several, showing the statistically significant efficacy of an intervention, no recommendation can be given as to the use of that intervention. There is certainly good reason for caution. Observational studies of the efficacy of treatment — comparisons before and after treatment, or of treatment recipients with nonrecipients — have often led to serious errors in medicine. To cite just one egregious example, the use of diethylstilbestrol in pregnancy was justified by deeply flawed observational research (2), whereas the sole RCT (3) — which showed no evidence of effectiveness at all — was roundly ignored. The long-term damage to the treated fetus was only uncovered after years of use by millions of pregnant women (4).
There are times when RCTs are difficult to undertake. Some interventions do not lend themselves easily to randomization. Large multicomponent systems of care such as coronary care and complex surgical procedures are not easily subject to random assignment. It is difficult to imagine an RCT of the Heimlich maneuver or of door-to-balloon time in angioplasty. The equipoise required to undertake a trial may be undercut by accumulated experience or powerful belief systems that make an untreated control arm seem unethical. Trials at times can be so costly to mount that it may not seem worth the investment of time and resources. For all these reasons, many widely used interventions in medicine have never undergone assessment by RCT.
The field of epidemiology, though not bereft of trials, draws its conclusions predominantly from observational data. The first and nearly identical sets of epidemiologic rules of judgment for ascertaining causality were published on both sides of the Atlantic nearly simultaneously (5, 6), apparently not quite independently (7). The US version provides a handy quintet of criteria — consistency, strength, specificity, temporal relationship, and coherence — with which to judge the likelihood of any exposure-disease association being causal. For the purpose of evaluating treatment, temporal association, i.e., that the exposure or treatment preceded the onset of disease is generally a given, and the specific treatment and the specific outcome of interest are usually a clear focus of the research. That leaves three criteria to consider when thinking of observational research in relation to treatments in medicine.
Strength. Strength refers to the size of the observed difference, not the P value associated with it, which only assesses the role of chance in creating the association, whatever the strength of the association. If a judgment about treatment is to be made on the basis of observational data, the effect size had better be substantial. Confounders and biases inevitably arise when study arms are not made comparable through random assignment. Since confounding factors must have effects on the outcome that are larger than the effect being claimed for the treatment, a large effect size puts a cap on the likelihood of a confounder or a bias operating; a 50% reduction in mortality from treatment is much harder to confound than a 20% reduction.
Coherence. Does the intervention make sense in light of what else we know? A prominent component of this criterion is mechanism of action. Few interventions are undertaken without a hypothesized mechanism of action, but the evidence supporting the mechanism can come from a variety of sources, and in vitro does not always translate to in vivo, especially in humans.
Consistency. A treatment repeatedly shown to be effective is more likely to be truly effective than one which seems effective in some studies but not in others.
We suggest one additional criterion that has stood the test of time, and that is the use of total population data to draw conclusions. RCTs are conducted in individuals willing to be enrolled in a study and to accept randomization. Such individuals are usually younger, healthier, more educated, and less likely to be from minority populations. Generalization from trials can therefore be uncertain, but total population data have sample sizes thousands of times larger than any trial and exclude virtually no one.
Some of the best evidence for the effectiveness of cancer screening is the consistent declines in the mortality rates for the four cancers universally screened for in the US — breast, colon, cervix, and prostate — and the correspondence of these declines with the onset of screening and the paucity of alternative explanations for the declines (8, 9). Several studies of whether newborn intensive care reduces neonatal mortality have been based on total population data sources, which, without exception, show lower mortality in high-risk newborns born where intensive care was available (10). These cross-sectional assessments have been amply supported by time-trend findings from vital data in the total US population (11).
In epidemic situations, the problems of conducting phase III RCTs are compounded by the absence of information on the most appropriate patients to treat, dosage to use, and side effects generally uncovered in phase I or II trials. Moreover, the urgency to treat when patients are dying in large numbers can make providers reluctant to wait for the findings of a large trial.
Convalescent plasma in the current coronavirus disease 2019 (COVID-19) pandemic provides a useful example. Trials were slow to be mounted in the early days of the pandemic, and all trials so far published have been small and statistically inconclusive for mortality. Six of the seven trials now in the public domain (none from the US) showed nonsignificantly lower mortality in the treated arms, averaging approximately 50%, but significant clinical improvements in other parameters were noted in some of the trials (12). Results from much larger trials are now on the verge of being reported, and by the time this Viewpoint is published, we may not have to rely on observational data to draw conclusions, but as of now, the most informative data we have are from observational studies.
At least 13 studies have been published comparing patients with COVID-19 treated or not treated with convalescent plasma, and all show reduced mortality in the treated group, often closely mirroring the findings of the RCTs (13). The most convincing evidence, however, is the strong and significant dose-response relationship of the active agent in convalescent plasma — antibody titer — with mortality in two studies of several thousand patients for whom antibody levels could be assessed in the plasma they received (14, 15). The higher the titer, the lower the mortality. Antibody levels were unknown at the time of transfusion, making the possibility of bias or confounding nearly as remote as in an RCT. These data were a major component of the evidence used by the FDA to issue Emergency Use Authorization for in-patient treatment with convalescent plasma on August 23, 2020 (16).
Turning to the causal criteria, the association of convalescent plasma with mortality found both in trials and observational data — approximately a halving of mortality — is a strong effect. The findings have been both consistent across studies and coherent with what we know of how antibodies work and historical evidence of the effectiveness of convalescent plasma in other infectious diseases (17).
Ultimately, everything we do in medicine is a matter of judgment. Although RCTs should be supported whenever possible, we should not be paralyzed into inaction when they are not available, nor should we willfully ignore important evidence coming from other sources. Federal agencies, professional bodies — indeed, any person or group making therapeutic recommendations — should always consider the totality of the available evidence, including that generated by observational studies.
Conflict of interest: The authors have declared that no conflict of interest exists.
Copyright: © 2021, American Society for Clinical Investigation.
Reference information: J Clin Invest. 2021;131(2):e146392. https://doi.org/10.1172/JCI146392.
Generating evidence for therapeutic effects: the need for well-conducted randomized trialsRobert M. Califf et al.