Reproducible science in the computer age

miracleIt has been conventional wisdom that computing is the “third leg” of the stool of modern science, complementing theory and experiment. But that metaphor is no longer accurate. Instead, computing now pervades all of science, including theory and experiment. Nowadays massive computation is required just to reduce and analyze experimental data, and simulations and computational explorations are employed in fields as diverse as climate modeling and research mathematics.

Unfortunately, the culture of scientific computing has not kept pace with its rapidly ascending pre-eminence in the broad domain of scientific research. In experimental research work, researchers are taught early the importance of keeping notebooks or computer-based logs of every detail of their work—experimental design, procedures, equipment used, raw results, processing techniques, statistical methods used to analyze the results, and other relevant details of an experiment.

In contrast, very few computational experiments are performed with such documented care. In most cases, there is no record of the workflow process used, the specific computer hardware and software configuration. Often even the source code is not retained. In addition to concerns about reproducibility of results, these regrettable practices ultimately impede the researchers’ own productivity.

Scientific fraud

A related concern is the disturbing rise in outright fraud in scientific research. Perhaps the most significant case is the scandal surrounding the work of Diederik Stapel, a social psychologist at the Tilburg University in the Netherlands. A report by the university found that fraud was certain in 55 of Stapel’s publications, and that an additional 11 older publications are questionable. Although the university noted that Stapel is “fully and solely responsible” for his instances of fraud, the review committee was critical of the larger research culture: “From the bottom to the top there was a general neglect of fundamental scientific standards and methodological requirements.” In addition, panelists found countless flaws in the statistical methods that were used.

An equally upsetting case is that of the bogus pluripotent stem cell breakthrough that came and went in one week last November.

In a private letter now circulating, Nobel economist Daniel Kahneman has implored social psychologists to clean up their act to avoid a “train wreck.” In particular, he discusses the importance of replication of experiments and studies on priming effects.

Along this line, there is growing concern of bias in the medical field, when trials (often paid for by large pharmaceutical corporations) are conducted, but the results are never published, perhaps because they are not particularly favorable. Such “cherry picking” of results introduces a severe bias into the field, and constitutes a risk to each individual’s health. Many observers are now calling on the medical community to register all trials, and report all results.

An interesting commentary on this topic has just been published by Robert Trivers, author of The Folly of Fools. He mentions that in an analysis of papers published in the 50 psychological journals, authors whose results were closer to the statistical cut-off point (p=0.05) were less likely to share their raw data.

Reproducibility workshop

These and related concerns were the topic of a recent workshop on Reproducibility in Computational and Experimental Mathematics, which was held December 10-14, 2012, at the Institute for Computational and Experimental Research in Mathematics (ICERM) in Providence, Rhode Island. Meeting participants included a diverse group of computer scientists, mathematicians, computational physicists, legal scholars, journal editors and funding agency officials, representing academia, Google and all points in between.

While different types and degrees of reproducible research were discussed, many argued that the community needs to move to “open research,” which means research where widely used software tools are routinely used to (a) fully ‘audit’ the computational procedure, (b) replicate and independently reproduce the results of the research, and (c) extend the results or apply the method to new problems.

Workshop participants strongly agreed that cultural changes are required. To begin with, most researchers need to be persuaded that their efforts to ensure reproducibility will be worthwhile, in the form of increased productivity, less time wasted recovering lost data or computer code, and more reliable conversion of results from data files to published papers.

Secondly, the research system must offer institutional rewards at every level from departmental decision making to grant funding and journal publication. The current academic and industrial research system, which places primary emphasis on publications and project results and very little on reproducibility matters, effectively penalizes those who devote extra time to develop or even merely follow community-established standards.

The enormously large scale of state-of-the-art scientific computations, which in some cases harness tens or even hundreds of thousands of processor cores, presents unprecedented challenges in this arena. Numerical reproducibility is a major issue at this scale, as is hardware reliability. Even rare interactions of computer circuitry with stray subatomic particles must be taken into account.

Along this line, it is a regrettable fact that software development is often discounted in the scientific community. It is typically compared unfavorably, say, to constructing a telescope, as opposed to doing real science with the telescope. Thus, serious scientists are discouraged from spending much time writing code or testing it and, are they are certainly not rewarded for the hard work of properly documenting their projects.

For example, it is a sad fact that NSF-funded web-projects remain accessible, on average, for less than one year after the funding stops. Typically these researchers are busy running a new project and have no free time or money to look after the old. Given the substantial and ever-increasing importance of computation and software, such attitudes and practices must change.

Finally, standards for peer review must be strengthened—perhaps emulating requirements already routine in the computational biology and genomics community. Journal editors and reviewers need to insist on rigorous verification and validity testing, along with a full disclosure of computational details. Some of this material might be relegated to a website, rather than appearing in the paper itself, but in that case there needs to be assurances that this information will persist and remain accessible.

Of course, the level of such validation should be proportionate to the importance of the research and the strength of the claims being made.  No one benefits if relatively banal work is buried in a sea of validatory documentation. Likewise, experimental mathematics results and global warming models cannot be covered by exactly the same requirements.

Exceptions will need to be granted in some cases, such as where proprietary, medical or other types of confidentiality must be preserved, but if so authors need to carefully state such exceptions upon submission, and reviewers and editors need to agree that such exceptions are reasonable.

Additional details are available in a report, produced by the workshop, which is available here: PDF and Wiki.

[This article is reprinted from the Math Drudge blog. A slightly edited version appeared in the Huffington Post.]

[Added 25 Feb 2013:] A nice news article discussing several of these themes, in particular reliability and reproducibility in mathematical research, been posted on the Simons Foundation website.

Comments are closed.