Why most research will tend to be wrong?

In the paper, Ioannidis laid out a detailed mathematical proof that, assuming modest levels of researcher bias, typically imperfect research techniques, and the well-known tendency to focus on exciting rather than highly plausible theories, researchers will come up with wrong findings most of the time.

Simply put, if you’re attracted to ideas that have a good chance of being wrong, and if you’re motivated to prove them right, and if you have a little wiggle room in how you assemble the evidence, you’ll probably succeed in proving wrong theories right.

His model predicted, in different fields of medical research, rates of wrongness roughly corresponding to the observed rates at which findings were later convincingly refuted: 80 percent of non-randomized studies (by far the most common type) turn out to be wrong, as do 25 percent of supposedly gold-standard randomized trials, and as much as 10 percent of the platinum-standard large randomized trials.

That is an excerpt from David Freedman’s essay on John Ioannidis in The Atlantic. Worth reading in full.

The emphasis is mine. These refutation rates do not surprise me in the least, and I would be even less surprised if they are higher in the social sciences (even in RCTs).

I do not know Ioannidis’ work, but plan to look closer. Reader opinions?

The research has not made him unpopular, it seems:

To say that Ioannidis’s work has been embraced would be an understatement. His PLoS Medicine paper is the most downloaded in the journal’s history, and it’s not even Ioannidis’s most-cited work—that would be a paper he published in Nature Genetics on the problems with gene-link studies.

…Other researchers are eager to work with him: he has published papers with 1,328 different co-authors at 538 institutions in 43 countries, he says. Last year he received, by his estimate, invitations to speak at 1,000 conferences and institutions around the world, and he was accepting an average of about five invitations a month until a case last year of excessive-travel-induced vertigo led him to cut back.

Hat tip to .Plan.

9 Responses

  1. Snappy idea, but kind of falls prey to its own criticism. What exactly does it mean for a study to be ‘wrong’? At least in the social sciences, a lot of research is less important because of the answer it gives to a specific question, than because of a new angle or depth of analysis, new understanding of concepts of definitions…essentially, an intellectual curiosity that makes the asking of a good question as useful for an understanding of something as the answer to the question. Perhaps there’s more truth in it for the enthusiastically quantitative social scientists, but good analysis is usually more about understanding complexities than about having ‘rights’ and ‘wrongs’.

    1. Caitlin — work that applies the scientific method to social phenomena (social science) should theoretically be (1) internally consistent [ensured best in formal mathematical terms], and (2) empirically valid [meaning the chosen measures of the social world lead to the analytical conclusions].

      To your point on a “good question”, I think from a scientific paradigm it’d make sense that it take the form of (more or less) a sensitivity analysis: take prior analysis, rejigger definitions/measures/analytical approach, and produce different output.

      That said, work that is (a) internally contradictory, or (b) empirically invalid doesn’t seem to advance knowledge in an extensible fashion.

  2. I think it’s less pressing in the social sciences that results be externally valid (i.e., replicated in different contexts) than that the fields appear to suffer from a major lack of internal validation in the peer review process (i.e., given data + code, results are replicable).

    McCullough et al tried to replicate results from the code/data archive for JMCB:
    “…McCullough, McGeary and Harrison (2006; MMH) attempted to replicate every empirical article published in the JMCB since 1996. Of 186 empirical articles, only 69 had archive entries. Of these, replication could not be attempted for 7, owing to lack of software or the use of proprietary data. Of the remaining 62, the results of 14 articles could be replicated.”
    http://www.pages.drexel.edu/~bdm25/cje.pdf

    This is truly disheartening. One would think the first check in an empirical paper review would be replicating the results from the data/code combo; heck, this shouldn’t even the difficult portion of peer review if data/code/software specs are submitted.

    I fear that until prominent journals enforce data/code submission and require referees to QC the output, this problem will continue. Even the good-intentioned will not allocate their scarce research production time toward ensuring replicability.

    What’s most frustrating is that much of this could be resolved and automated using existing tools and paradigms in the programming/CS world to ensure replicability. (Example: stringent journal requirements would greatly incent the development of software that reliably compiles code into different languages [Stata/R/SAS/SciPy].)

    This is a problem I, sadly, spend far too much time thinking about. Maybe I just want to be able to know I can rely on econ output as much as on open source software…

  3. Should NGOs love groups?
    Exciting
    Likely wrong
    Lots of wiggle room to prove right
    Too expensive to do large scale trials (N=50?)
    And… as schenk notes, nobody will remember you were wrong
    I see a big future here!

  4. One of the worrying things from the article was that even WHEN some bit of research, a recommended course of action, had been shown to just simply be wrong, it continued to be cited for many years as if it was true. This goes against the idea of self correction in science.

    In respectful response to Mr. Warren, with regard to getting results from these sorts of trials, I’d worry that we don’t even know when we actually DO get results; the same people and techniques, and same error checking and correcting mechanisms, that make trial seem promising is being used to ‘confirm’ that they are working.

    1. The clearest example I can think of is AIDS and HIV. Inside a few decades, the disease has gone from certain death to mostly managed. The therapies in use have some clear utility.

      Now, do we have the clear understanding of what is happening – biologically – when these therapies are in use? I sure don’t know, but if all the research is bunk we wouldn’t have such improvement, right? And I’m sure there were – and still are – a whole lot of dud therapies, but can the placebo affect explain the improved mortality stats?

      I’m curious to know how basic scientific observations, not therapies, are affected by Ioannidis’ read. His point still stands – this isn’t some small, cordoned-off problem; it’s a very big problem.

      Since reading this post and the Atlantic article, I’ve felt out of sorts. I know science is imperfect, but even in this broken state – it still has utility. I’m a layman, so forgive the inevitable, gaping holes in my knowledge.

  5. This is really interesting to me. I may have to get one of my resident grad students to download it for me from the digital library. What is interesting, at least on the surface level, is that we still have a number of useful therapies that come out of, say, medical trials. Being somewhat-to-moderately wrong doesn’t completely stymie us. That’s interesting, too.

    Completely neato; thanks for putting this in my field of vision.