Every day, we face a number of decisions about how to maintain a healthy life. What foods should we eat? How often should we exercise? Should we take vitamins? Should we be screened for cancer, heart disease, or depression?
In answering these questions, it often feels as if we are bombarded by conflicting information. For example, for years menopausal women were advised to use hormone therapy to reduce their risk of cancer, osteoporosis, and heart disease. Yet in 2002, research began to emerge showing that hormone therapy was not only ineffective but also dangerous, actually increasing women’s risk of stroke and heart attack.
Why does the public receive so much conflicting medical advice, and how can the advice change so significantly over time? Though it may seem like the science is shifting every day, most confusion results from disagreements in how researchers assess medical evidence, rather than dramatic shifts in the evidence itself. Not all scientific studies are created equal, and careful interpretation of findings can be as important as the research itself. This article explains how the misinterpretation of what certain outcomes and research methods demonstrate can lead to unwarranted conclusions and unsupported recommendations.
Measuring hard outcomes
Medical recommendations are typically based on studies of surrogate endpoints, that is, health conditions that are stand-ins for an outcome we care about.1 Surrogates can be detected earlier than the outcomes they predict and are often easier to measure. For instance, a healthy person cares about their future risk of heart attack or stroke, but uses blood pressure, cholesterol, and weight as predictors of these conditions.
Treatment is often aimed at improving these surrogate measures – lowering cholesterol, for example – rather than directly targeting the “hard outcomes” they predict, like mortality and morbidity.(a) Unfortunately, the last twenty years have witnessed many instances of surrogate failure. Surrogates that correlated with negative hard outcomes were improved with treatment, but the treatment did not always reduce risk for the hard outcome itself.
The Action to Control Cardiovascular Risk in Diabetes (ACCORD) study, for example, focused on regulating patients’ Hemoglobin A1c (HbA1c), which serves as an indicator of blood sugar levels.(b) In the ACCORD study, over 10,000 type II diabetics were randomly assigned to one of two treatment strategies: treatment that reduced their HbA1c levels to less than 6 (the normal range for non-diabetics) or treatment that targeted an HbA1c level between 7 and 8 (slightly above normal).
At the time of the study, many experts predicted that the lower hemoglobin targets would be better (because they correspond with tighter glucose control), but ACCORD reached the exact opposite conclusion. Lower HbA1c levels were actually associated with increased mortality.2 The ACCORD study is just one example of a surrogate endpoint proving unreliable. Other trials have identified blood pressure, cholesterol, and tumor shrinkage as surrogate endpoints that did not track with more meaningful hard endpoints, such as mortality.3
While scientists frequently rely on surrogate measures as stand-ins for the hard outcomes we care about, the aforementioned research brings this approach into question. This research suggests that it is important for clinical trials to directly examine hard outcomes, such as risk of death (mortality) and quality of life (morbidity), rather than only looking at surrogate measures. Some scientists are beginning to follow this approach; for example, an important recent study on weight loss looked at cardiovascular events (like heart attacks) and death as key outcomes.4
Designing reliable methodologies
In addition to picking appropriate outcomes, it is equally important to ensure that the research methodology behind a study is sound. Reliable methodologies are free from biases and have a high statistical probability of reflecting the underlying truth about the issue being examined.(c) Randomized, double-blind trials are the gold standard in medical research.5 In randomized studies, researchers treat half the study participants with one therapy and the other half with another, using a virtual coin flip to randomly assign participants to each group. In double-blind studies, neither the researchers nor the patients know which group each patient is in, eliminating the possibility of subtle biases based on this knowledge.(d)
When large and well done, randomized, double-blind trials have the strongest truth claim in all of biomedicine because they remove several potential sources of bias. Unfortunately, these kinds of studies are not always feasible. For instance, a randomized trial to determine whether eating one cup of yogurt a week makes a person live longer would require study participants to alter their behavior for years – either eating or not eating a certain amount of yogurt, depending on which study group they were in – and submit to decades of follow-up.(e)
When random studies are not an option, researchers typically turn to the second-best choice: observational studies. Rather than having the researchers determine what conditions and treatments each subject receives, this type of study measures the outcomes of decisions that subjects make themselves. For example, the researchers might ask participants to report how much yogurt they eat each week, measure key indicators of their health over time, and then use statistical analysis to determine how yogurt eating affects health outcomes.
They may be easier, but how reliable are observational studies? Research is beginning to give us a picture of how they stack up against randomized trials, and the results are not encouraging.6 The largest empirical comparison of observational studies and randomized controlled trials analyzed over 400 studies on 45 health issues. For 16% of the health topics reviewed, there were significant differences between the findings of the observational and randomized studies.7
When it comes to nutritional studies, the discordance is greater. In an empirical analysis of studies on 34 distinct nutritional issues, for nearly a third of the nutritional questions reviewed, the randomized trials and observational studies found effects in opposite directions.8 For instance, observational studies found that beta-carotene reduced the incidence of lung cancer, while randomized trials found no effect, or even a slight increase in cancer rates.
Just how often observational trials lead to erroneous conclusions remains uncertain, but the fact that some do (and there is no way to know when looking at a single study) should give us pause. Being wrong a third or even a sixth of the time is not an acceptable standard for making health recommendations. Despite these concerns, research my colleagues and I conducted shows that the authors of observational studies have no problem drawing conclusions and making specific recommendations. In an examination of nearly 300 observational studies in the top medical journals, over half (56%) made specific recommendations even though the research designs were ill suited to rigorously support those conclusions.9
Do no harm
The medical principle of “First, do no harm” suggests that scientists and physicians have a responsibility to ensure a practice has a net benefit before recommending it, particularly as a preventative measure to people in good health.(f) Healthy people already feel good, so any practice advised to keep them well and prevent future illness must meet a high standard, with its benefits clearly outweighing any negative side effects.10
Such a standard can be met when we look at outcomes and methods rigorously and make recommendations only when strong evidence exists to support a conclusion. While surrogate endpoints and observational studies may be easier research approaches, they cannot be our only source of scientific information.
With this in mind, I will be writing a series of articles for Footnote exploring the evidence for several common medical practices recommended for healthy people, including screenings for cancer and other illnesses and the use of vitamin supplements. My goal is to help readers wade through the sea of medical advice by evaluating the most reliable and rigorous medical evidence.
- Staffan Svensson, David B. Menkes, and Joel Lexchin (2013) “Surrogate outcomes in clinical trials: A cautionary tale,” JAMA Internal Medicine, 173: 611-612.
- The Action to Control Cardiovascular Risk in Diabetes Study Group (2008) “Effects of Intensive Glucose Lowering in Type 2 Diabetes,” New England Journal of Medicine, 358: 2545-2559.
- B. Carlberg, O. Samuelsson, and L.H. Lindholm (2004) “Atenolol in hypertension: is it a wise choice?” Lancet, 364: 1684-1689. The AIM-HIGH Investigators (2011) “Niacin in Patients with Low HDL Cholesterol Levels Receiving Intensive Statin Therapy,” New England Journal of Medicine, 365: 2255-2267. John Neil Primrose, Stephen Falk, Meg Finch-Jones, Juan W. Valle, David Sherlock, Joanne Hornbuckle, James Gardner-Thorpe, David Smith, Charles Imber, Tamas Hickish, Brian Davidson, David Cunningham, Graeme John Poston, Tim Maughan, Myrrdyn Rees, Louise Stanton, Louisa Little, Megan Bowers, Wendy Wood, and John A. Bridgewater (2013) “A randomized clinical trial of chemotherapy compared to chemotherapy in combination with cetuximab in k-RAS wild-type patients with operable metastases from colorectal cancer: The new EPOC study,” proceedings of the 2013 ASCO Annual Meeting, Abstract 3504.
- The Look AHEAD Research Group (2013) “Cardiovascular Effects of Intensive Lifestyle Intervention in Type 2 Diabetes,” New England Journal of Medicine, 369: 145-154.
- John P.A. Ioannidis (2005) “Why most published research findings are false,” PLoS Medicine; 2:e124.
- S. Stanley Young and Alan Karr (2011) “Deming, data and observational studies,” Significance, 8: 116-120.
- J.P. Ioannidis, A.B. Haidich, M. Pappa, N. Pantazis, S.I. Kokori, M.G. Tektonidou, D.G. Contopoulos-Ioannidis, and J. Lau (2001) “Comparison of evidence of treatment effects in randomized and nonrandomized studies,” JAMA, 286: 821-830.
- Denish Moorthy, Mei Chung, Jounghee Lee, Winifred W. Yu, Joseph Lau, and Thomas A Trikalinos (2013) “Concordance Between the Findings of Epidemiological Studies and Randomized Trials in Nutrition: An Empirical Evaluation and Citation Analysis,” Nutritional Research Series, Vol. 6 / Technical Reviews, No. 17.6, Rockville, MD: Agency for Healthcare Research and Quality.
- Vinay Prasad, Joel Jorgenson, John P.A. Ioannidis, and Adam Cifu [DB1] (2013) “Observational studies often make clinical practice recommendations: an empirical evaluation of authors attitudes,” Journal of Clinical Epidemiology, 66(4): 361-366.
- Daniel K. Sokol (2013) “‘First do no harm’ revisited,” BMJ, 347.
- (a) In epidemiology, mortality refers to the number of deaths from a disease while morbidity refers to the number of people suffering from poor health due to a disease.
- (b) When blood sugar levels are elevated over time, as they are in diabetics, glucose in the blood attaches to the hemoglobin molecules in red blood cells, forming HbA1c.
- (c) It is also important that the findings can be reproduced across multiple studies. This replicability is a hallmark of science; if other researchers cannot duplicate someone’s findings, these findings may be inaccurate.
- (d) The placebo effect occurs when someone’s health improves simply because they believe they are receiving treatment, not necessarily because the treatment itself is effective. If the researchers or participants in a study know who is receiving the treatment and who is receiving a placebo, this knowledge may influence patient outcomes, making it hard to accurately evaluate the treatment’s effectiveness.
- (e) In addition to being onerous for participants and researchers, random controlled trials can raise ethical concerns. For example, how should one ethically test a potentially lifesaving treatment when random trials require denying the treatment to some of the study participants (i.e. those in the control group)?
- (f) “First, do no harm” is a widely recognized principle of medical ethics that advises doctors to avoid unnecessary or overly risky treatment. Though commonly believed to be part of the Hippocratic Oath sworn by doctors since ancient times, the exact phrase is not found in the Oath.