How do we know if a scientific discovery is valid? How do we differentiate between research findings that point to some broader truth and those that are just a fluke?
An essential element of the scientific process is reproducibility – the idea that any scientist in any lab should be able to conduct the same experiment under the same conditions and obtain the same results. Unfortunately, basic biomedical research has a reproducibility problem. New studies show that a large proportion (75-90%) of the research findings in even top academic journals cannot be reliably replicated.(a) This systemic lack of reproducibility undermines the scientific method and the ability to differentiate anecdote from evidence-based theories and models. Specious data taint the web of scientific knowledge, costing time and money in pursuit of false leads and delaying the development of potentially life-saving therapies.
While the scientific process is ultimately self-correcting, so that with enough testing incorrect data will eventually be discovered and disregarded, the scientific community – researchers, funding agencies, and publishers alike – have a responsibility to one another and society to increase this process’s efficiency. In my previous article, I outlined the incentives and constraints scholars face during the research and publication processes and discussed how these contribute to the reproducibility problem. Understanding these challenges offers insight into how we can create a scientific culture that supports and rewards reproducibility. Progress in this direction can be facilitated by changing academic publishing policies, establishing projects directed at measuring reproducibility, and improving training for graduate students.
Academic journals have a significant role to play in encouraging reproducibility. They can require more descriptive materials and methods sections and provide unlimited space for them, so that other scientists will know exactly how an experiment was conducted and how they can replicate it. The Journal of Visualized Experiments, which publishes research in video format in order to make methods more accessible, is already addressing this issue. The prominent journal Nature recently instituted policies requiring that, during the review process, authors provide complete descriptions of the number of independent replications for each experiment and the statistics used for analysis.1 Similarly, open-access journal PLOS ONE announced a policy requiring its authors to submit relevant data during the review process and recommending they do so by posting their datasets to online repositories like Dryad.
Journals should also publish more negative results – those in which an experiment had no effect or clear outcome – because the lack of a finding can sometimes be as important as a finding itself. For example, suppose a researcher hypothesizes that protein X is involved in a cellular process, but their experiments reveal that the process occurs normally even after protein X is inhibited. Perhaps this researcher also tests proteins Y and Z but still finds no effect on the process and so drops the project altogether to study something else. Though the experiments produced negative results, these results might nonetheless be extremely useful to other scientists studying the same cellular process.
Unfortunately, although negative results often constitute a large portion of scientists’ data, journals consider them less interesting so they often go unpublished. If journals accept more negative data, scientists can publish more of their work, which may decrease the pressure to overstate the significance of findings or to extract findings where there are none. Moreover, the availability of negative data will prevent others from spending time and funds on dead-end experiments. Many newly launched journals like the Journal of Negative Results in Biomedicine, the All Results Journals (with subject-specific versions for chemistry, biology, nanotechnology and physics), and the Journal of Negative Results (specializing in ecology and evolutionary biology) seek to provide a publication platform for negative results.
Another tactic for increasing reproducibility is to create and support systems for measuring it. Some academics have suggested linking studies that support or refute each other on databases such as PubMed to create a “metric of reproducibility,”2 which funders and academic institutions could use to evaluate researchers’ reproducibility track records.(b) Organizations evaluating researchers for funding or employment should focus on the steady production of quality, reproducible work rather than simply the number of publications in highly-ranked journals.
Companies providing reproducibility testing have also begun to crop up. The scientific services marketplace Science Exchange accepts publications for reproducibility testing by their network of service providers, with the option to publish the results in PLOS ONE. Researchers generally fund testing themselves, though Science Exchange received a $1.3 million grant to validate 50 high-impact cancer studies and is working to procure more funds to provide similar services at no cost to researchers.
Retesting results independently is a good way to double-check experiments that can be easily reproduced, but some of the most novel and high-impact research involves significant technical feats not commonly practiced in the average lab. For example, studies of gene expression might require measuring expression levels for tens of thousands of genes in mammalian cells grown under a variety of conditions. Such experiments are costly and involve sophisticated algorithms to analyze. Peer reviewers, who are busy running their own labs, may not be able to devote sufficient time to thoroughly evaluating complex experiments and analyses. Journals may need to employ specialized reviewers, such as in-house or contracted experts, to ensure that manuscripts for technically or statistically advanced experiments are vetted thoroughly prior to publication.
In addition to a number of changes to the academic publishing process, academic institutions and faculty need to improve the training researchers receive. Half to three-fourths of researchers in university labs are graduate students and, in my experience, many need better formal training in good scientific practice.3 Graduate programs often assume that students will receive this training on the job and neglect to provide it as part of the graduate curriculum. This training might include increased oversight of student work through a required number of one-on-one reviews with more advanced researchers or the implementation of checklists related to experimental reagents, study design, validation, and statistical analysis. Busy lab heads may need to create a management and training hierarchy involving postdoctoral researchers and longer-term lab associates.(c)
Lab and department leaders should also work to cultivate an environment conducive to discussions about research methodology. Events involving academic peers, such as thesis committee reviews and intradepartmental talks, could serve as not only checks on graduate student progress, but also open forums for discussing methodological challenges. In addition, more doctoral programs should incorporate formal biostatistics instruction into their curricula. Though many programs require some statistics coursework, it is often basic and not tailored to the specific needs of the field. Many of these measures could be encouraged if they were included as requirements for the training grants given by the National Institutes of Health (NIH) specifically to fund graduate student work.(d)
Though the complexity of biological systems and current research methods poses challenges to reproducibility, there are clear steps we can take to make the scientific process more transparent and more accurate. Journals should require the full disclosure of materials and methods, ask for justification of statistical analyses, and accept studies based on quality rather than novelty (thereby allowing for the publication of negative results). Grant makers and hiring bodies must de-emphasize numerous publications in highly ranked journals as the sole means to evaluate a researcher’s qualifications for funding or a position. They should reward quality, reproducible work, which the scientific community should make an effort to measure. Academic institutions should also provide training and structure to enable researchers to design and report experiments as accurately as possible. While we can never completely eliminate variation across studies, promoting more solid, reproducible research is within our reach.
This article is the second in a two-part series. The first article addresses the causes of the reproducibility problem in biomedical research, while the second examines potential solutions.
Endnotes
- Announcement: Reducing our irreproducibility” (2013) Nature, 496(7446): 398.
- Jonathan F. Russell (2013) If a job is worth doing,it is worth doing twice, Nature, 496(7443): 7.
- National Institutes of Health (2012) Biomedical Research Workforce Working Group Report.
- C. Glenn Begley and Lee M. Ellis (2012) Drug development: Raise standards for preclinical cancer research, Nature, 483: 531-533.
- Florian Prinz, Thomas Schlange, and Khusru Asadullah (2011) Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery, 10(9): 712.
Sidenotes
- (a) In studies conducted by two biotechnology companies, researchers were able to reproduce major findings for only 10%4 to 25%5 of papers tested. Reproducibility did not correlate with the rank or prestige of the journal in which the findings were published.
- (b) Academics are currently evaluated based on the number of peer-reviewed papers they publish, the prestige of the journals these papers are published in, and the number of other academic papers that cite their work. The reproducibility of their research is typically not evaluated.
- (c) Postdoctoral researchers are academics who have completed their Ph.D.’s (typically at a different institution) but do not hold faculty positions. They comprise the majority of non-graduate-student researchers in academic labs, and typically spend three to six years in a lab before either starting their own labs (an extremely competitive venture) or leaving academia.
- (d) Last year, I represented my department during a site visit by the NIH to assess compliance with NIH requirements for training grants. Their major concerns were related to increasing inter- and intra-departmental collaboration and monitoring graduate student progress, but there was little attention to the training process itself.