One of the biggest stories in academia recently was the retraction of more than 120 papers by well-known journal publishers Springer and the Institute of Electrical and Electronic Engineers (IEEE). The retraction followed the discovery by Dr. Cyril Labbé of Joseph Fourier University1 that all the papers in question had been generated by SCIgen, a computer program that automatically, randomly produces nonsense computer science papers.(a)
It is still unclear who submitted the papers or why – some of the individuals listed as authors were unaware of the papers’ existence. While it might seem surprising that such a thing could happen, these kinds of embarrassments are not uncommon and, in fact, are not even the most worrisome problem in academic publishing. The system that allowed these articles to get through the review process is one that values volume of superficially impressive academic papers at the expense of high-quality work. By rewarding articles that offer the appearance of legitimacy, but do not necessarily contain significant discoveries, academic journals perpetuate a “public-or-perish” culture that incentivizes researchers to compromise academic rigor and churn out articles at an unrealistic pace.(b) And by measuring academic success through publication in these very journals, universities ultimately foster this same culture on their campuses.
A History of Hoaxes
Fake papers have regularly appeared in the scholarly record, often in order to demonstrate problems with the peer review process. For instance, in the mid-1990s, Alan Sokal famously submitted a paper to Social Text in which he boldly fused string theory, Lacan, and Derrida, and argued that quantum gravity had profound political implications.2 When the article was published, he revealed the hoax in a simultaneous publication, where he explained his rationale as follows:
For some years I’ve been troubled by an apparent decline in the standards of intellectual rigor in certain precincts of the American academic humanities…
Would a leading North American journal of cultural studies… publish an article liberally salted with nonsense if (a) it sounded good and (b) it flattered the editors’ ideological preconceptions?3
Since then, there have been reports of numerous hoax papers aimed at raising awareness of pseudo-academic practices such as “spamferences” and predatory publishers.(c) Most recently, Science reported on a massive ‘sting’ operation that used computer-generated variants of a fake paper to expose ‘predatory’ publishers.4 John Bohannon, the scientist behind the sting, created a “credible but mundane scientific paper” filled with such “grave errors that a competent peer reviewer should easily identify it as flawed and unpublishable.” He then submitted 255 versions of the paper to various journals, resulting in no fewer than 157 publications.
Returning to the latest case, in a statement issued immediately after the story went public, Springer expressed confidence in its standards of peer review while also pledging to strengthen the process. In their words, “Unfortunately, scientific publishing is not immune to fraud and mistakes… The peer review system is the best system we have so far and this incident will lead to additional measures… to strengthen it.”
Springer’s response may be shaped in part by the fact that all the identified papers seem to have been published in conference proceedings, which do not always adhere to the same standards of peer review that apply to research articles.(d) It is possible for publishers like Springer to feel confident about the quality of their peer-reviewed journal articles, while at the same time acknowledging the need for more stringent processes in other areas.
Deeper Problems At The Heart Of Science
Increased diligence by publishers may not be enough to ensure the quality of academic articles. The SCIgen incident, the Science ‘sting,’ and previous hoaxes are just symptoms of a more serious crisis in scholarly communication. In an article in the Guardian, linguist Curt Rice describes three aspects of this crisis that I believe are important to consider: increases in the retraction of papers, problems reproducing results, and inadequate measures of research quality.5
1. Retractions are on the rise.
The number of articles retracted from Thomson Reuters Web of Science research database has increased tenfold over the past decade, despite a mere 44% increase in the total number of articles published.6 At least 1,333 papers have been withdrawn from the National Institute of Health’s PubMed database since 2002.7 Retractions appear in even the most prestigious journals, including Science, Nature, and Cell.7 The website Retraction Watch regularly reports on papers that are retracted for plagiarism, falsification of data, failure to reproduce results in subsequent experiments, or other reasons.
It is unclear whether more retractions are a sign of increased malpractice on the part of researchers or simply closer scrutiny, but either way, part of the academic record is tainted, and peer reviewers, editors, and publishers all share the blame. There is increasing pressure for scholars to face consequences in cases of serious fraud and misconduct, particularly when research was funded with taxpayer dollars. There have also been proposals that some level of accountability should apply to everyone involved in the publishing process. For instance, the names of the referees who review an article could be made available (as is done by the Journal of Bone and Joint Surgery) so that they feel some degree of accountability if illegitimate research slips through. Some have even suggested holding publishers liable by requiring them to refund readers who purchase publications that fail to meet rigorous academic standards.
2. Reproducibility is a bigger problem than retractions.
Although retractions are on the rise, they are still somewhat rare: retraction is the fate of only 0.01% of scientific papers.8 A much bigger problem is related to reproducibility. The ability of other researchers to recreate a study’s findings is a key test of their accuracy. Unfortunately, two recent reviews of a total of 120 biomedical research studies found that only 10% to 25% of the findings of these studies could be reproduced.9 Results that are not reproducible often make it into prestigious academic journals, which are understandably keen to publish studies that report novel, high-impact findings. However, sometimes these findings are just statistical flukes that do not hold up under repeated testing.(e)
A number of factors make it challenging to distinguish reproducible and non-reproducible results. Some studies are difficult to replicate because of the inherent complexity of the phenomena studied. Replications of previous studies are also notoriously hard to fund and to publish. Few people get excited to read, let alone bankroll, a study that does not produce any new findings but simply confirms or refutes earlier findings. Although there are a few journals, like the Journal of Negative Results in Biomedicine, dedicated to publishing such studies, the majority of published science goes unchecked. These problems suggest a need to rethink how research is published and to create incentives that establish reproducibility as a fundamental requirement of good science.(f)
3. Current measures of the quality of research are inadequate.
The final problem Rice identifies is the prominence attached to a journal’s impact factor (IF), a measure intended to indicate the journal’s prestige and quality.4 A journal’s IF is based on how often its articles were cited in other journals over the past two years.(g) There are many ways to artificially inflate a journal’s IF, and papers published in high IF journals tend to be retracted more often than those in less prestigious publications. It is embarrassingly easy for low-quality papers to get published in journals with high IFs, as the string of hoaxes demonstrates, and it is even more common for high-quality papers to be overlooked in journals with less-than-stellar IFs.10
Another problem with the impact factor is that it is often used as a metric for indirectly assessing the quality of the individual papers or researchers published in a journal. This shortcut enables administrators and politicians to make judgments about research without actually attempting to understand it. A much better way to evaluate the quality of individual papers would be to count the number of times they have specifically been cited by other researchers. While shifting emphasis from the journal to the paper seems intuitive, it would mean that we would have to withhold judgment on papers for several years, until their impact on the field has influenced future publications.
Difficult questions, few answers
The common underlying cause of all three problems discussed above is a “publish or perish” culture in which academics are placed under intense pressure to demonstrate that they are engaged in useful work. Unfortunately, such pressure is not always compatible with the careful reflective process requisite to quality research. In the words of Belgian academic Jozef Colpaert:
How many points would Louis Pasteur, Henri Poincaré, Claude Shannon, Tim Berners-Lee and others nowadays earn within the new academic evaluation system?
Our ultimate goal should be not only to avoid publication hoaxes and other egregious violations of academic norms, but also to improve research across the board. Understanding why and how pressures to maximize productivity are actually harming the quality of research is the first step to promoting better science.
An earlier version of this article was published on the author’s personal blog.
Endnotes
- Cyril Labbé and Dominique Labbé (2013) “Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science?” Scientometrics, 94(1): 379-396.
- Alan D. Sokal (1996) “Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity,” Social Text, 46/47: 217-252.
- Alan D. Sokal (1996) “A Physicist Experiments With Cultural Studies,” Lingua Franca, May/June.
- John Bohannon (2013) “Who’s Afraid of Peer Review?” Science, 342(6154): 60-65.
- Curt Rice (2013) “Science Research: three problems that point to a communication crisis,” The Guardian, February11.
- Richard Van Noorden (2011) “Science publishing: The trouble with retractions,” Nature, 478: 26-28.
- R. Grant Steen, Arturo Casadevall, and Ferric C. Fang (2013) “Why has the Number of Scientific Retractions Increased?” PLOS ONE 8(7).
- Ferric C. Fang, R. Grant Steen, and Arturo Casadevall (2012) Misconduct accounts for the majority of retracted scientific publications, Proceedings of the National Academy of Sciences, 109 (42): 17028-17033.
- C. Glenn Begley and Lee M. Ellis (2012) Drug development: Raise standards for preclinical cancer research, Nature, 483: 531-533. Florian Prinz, Thomas Schlange, and Khusru Asadullah (2011) Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery, 10(9): 712.
- Vincent Lariviere, George A. Lozano, and Yves Gingras (2013) “Are elite journals declining?” Cornell University Library.
Sidenotes
- (a) Computer scientists at MIT created SciGen, which is free for anyone to download, to demonstrate the low standards for getting papers into conferences. The algorithm generates jargon-heavy phrases often bordering on the absurd. For example, a prestigious computer science conference accepted a SciGen-generated paper about “the famous ubiquitous algorithm for the exploration of robots.”
- (b) “Publish or perish” describes the idea that academics must regularly publish new research in peer-reviewed academic journals in order to retain their positions. Nobel laureate Peter Higgs, who has a subatomic particle named after him, recently claimed that he would most likely perish in academia if he were starting out today because he would not be considered productive enough.
- (c) Spamferences, or junk conferences, generally are not academically accredited but are established to earn money for the organizers. They advertise with junk mail and have very loose standards for acceptance. Similarly, predatory publishers have few to no standards for accepting academic papers, which they publish open-access for a fee.
- (d) Conference contributions are often judged on the merit of a short abstract, so that scholarly output can be rapidly disseminated. This allows academics to benefit from feedback from other conference participants and develop ideas that are still rough around the edges into a ‘proper’ academic article.
- (e) In the social sciences, for example, the common threshold of statistical significance is a p-value below 0.05, which means that at least one finding in every 20 could be a product of statistical chance rather than a reflection of some underlying truth.
- (f) Various initiatives have cropped up to facilitate attempts to replicate scientific results. The journal PLOS ONE is requiring that contributors make all of their methods and data available online so that others have the information necessary to replicate their work. Nature requires descriptions of replications and statistics to accompany article submissions. The online marketplace Science Exchange now offers a service for testing researchers’ results independently and recently received a grant to test the 50 most impactful recent cancer studies.
- (g) The impact factor was originally designed to help acquisitions librarians manage subscriptions: Journals with high IFs were likely to contain more useful research, and were therefore assigned higher priority when making purchasing decisions.