A Problem With Research?


Part 1 of the essay series Research and Knowledge Accumulation.


Mountains of research

Our society is spending an enormous amount of money and effort on research. Here is a graph of PhDs granted, money spent, and papers published over the last hundred years:

The sheer number of papers is staggering. According to one estimate, there were 2.4 million journal articles published in scholarly peer-reviewed journals in 2017, just for science and engineering. The number grows higher every year.

Is research equally productive?

To some people, this indicates a problem. If science is a machine, we’re increasing the inputs. Are we getting proportionally greater outputs?

In a very broad sense, it feels like the answer must be no. From 1900 to 1930, we found special relativity, general relativity, and quantum mechanics. We generally recognize these to be shocking advances that dramatically changed our understanding of both the universe and our place in it, and unlocked the possibility of making many, many more advances. From 1985 to 2015, we found some things, like the top quark, the Higgs boson, and gravitational waves. These are important. But at least it seems we ought to be able to say they collectively are not ten times as important as relativity and quantum mechanics. We increased the inputs by more than ten times — just look at the space under the curves above. But the outputs do not seem to have kept pace.

This seems correct. But it is hard to be definitive. It is plausible that the most important inventions or breakthroughs or discoveries aren’t the ones that receive the most fanfare. When considering inventions for instance, rockets, nuclear weapons, and birth control are obviously important. But it might be that there are less obvious inventions — like the modern shipping container —  that are nevertheless extremely impactful. Further, it is hard to compare single large advances to many small advances. What if we’re making so many small advances, it makes up for there being fewer large advances?

Part of the problem is that we don’t have adequately good ways to measure the importance of advances. First, there is the question of what we should measure. Do we care about how much a breakthrough or discovery changes our understanding of the world? Or our understanding of ourselves? Or is it how much an advance enables us to make further advances, or how much it contributes to economic growth? Second, there is a measurement problem. Whichever of these things we choose, we won’t be sure how to measure it.

There are a few clever ways to try to cut through the confusion and clinch the case that research progress is slowing down. One is to defer to scientists on the questions of importance by asking them directly or looking at patterns in the awarding of important prizes. Another is to try to use many different measures, such as total GDP, total factor productivity, patents, crop yields, life expectancy, and Moore’s Law, to see if many different indicators all point in the same direction. These approaches are insightful and help to show us the limits of our current ability to assess progress. If we rely on our own most highly awarded experts, or on our most commonly used metrics, maybe research progress overall is slowing down.

There is still room for doubt, however. As the story of science tells us, reigning experts and standard measures frequently fail to capture the most important realities — especially the reality of new discoveries. Do the Nobel Prize winning physicists and total GDP capture the innovation unlocked by the internet? Has enough time passed for us to assess the importance of developments over the last ten years with deep learning and CRISPR?

Trial, error, and non-replication

One of the main sources of evidence that there is a problem with research is the fact that many fields are undergoing replication crises. In psychology, a recent large scale study found that only about 55% of important findings replicated. Previous studies found different numbers. There are similar problems in medicine (25%), experimental economics (60%), and elsewhere.

This indicates at least that there is some problem with how we’re approaching research. It is generally believed that studies should replicate. If studies don’t replicate, we either have bad studies or wrong views about replication.

Even if we grant that studies should replicate, there is still the question of how bad it is if they don’t. On one hand, it can seem quite bad. If many of a field’s results are untrustworthy and you can’t tell which are which, perhaps the entire field becomes untrustworthy. If non-replicability is likely to be underreported, maybe other fields’ results should be called into question too.

On the other hand, we expect a lot of trial and error in science. We expect the base rate of true, interesting claims to be low. We have a very competitive system and researchers are driving quite hard. It’s not clear exactly how bad it is that researchers cut corners sometimes.

On the most conciliatory view, replication crises are the unfortunate but understandable byproduct of ramping up the total quantity of research. The historical gains from investment in science have been literally astronomical, and so it may be correct to throw everything we have at it. An increase in quantity is sure, one might argue, to bring about a decrease in quality in at least some areas. One might then argue that the replication crises are signs that science is self-mending, and hence, working properly.

Uncited and unread

Returning to the millions of papers published every year, there is one remaining concern to raise. There is evidence that 50% of papers are never read, except by the author and journal referees, and that 90% of them are never cited. Doesn’t this show that there’s something wrong with research?

Again, it’s hard to say. Academics report reading an average of 250 papers per year. So we can’t say researchers are neglecting each other’s work. More deeply, advances have historically come from surprising places, like the movement of a needle or the theory of prime numbers. If researchers want to study obscure topics, should we tell them otherwise?

A missing concept

Why is it so hard to tell whether there is a problem with research? The answer, I would suggest, is that there are important gaps in our understanding of how research works. The process is considered mysterious on at least some level by almost everyone. It’s hard to judge that progress is slowing down because we don’t know how to compare research advances, we don’t know how research converts into technology, economic growth, or more research, and we don’t know how much to trust our own official measures. It’s hard to judge how serious the replication crises are because we don’t know how many corners we should expect researchers to cut or how bad it is if they cut them. And unless we know where breakthroughs will come from, it is hard to argue that researchers should focus on different topics.

More broadly, the system’s progress despite its dysfunctionality is part of the story itself. The progress of science is inevitably described as messy, contingent, and flawed. How can you critique something that contains as a core part of its self-understanding the fact of its own imperfection?

Despite all of this, there is a way to tell that there is a problem with research. In particular, there is a factor that is crucial to research progress that is being ignored by almost everyone, including researchers, funders, and people who think abstractly about the health of academia and the scientific enterprise. This factor is the logic of when knowledge accumulates and when it does not.

In this essay series, I will introduce the idea of the accumulation of knowledge, show how knowledge accumulation is not automatic, and discuss its internal logic. This will yield an explanation that verifies the feeling that something has gone wrong with research, shows what is wrong with the towering stacks of papers being produced, and explains how science is not simply a machine whose outputs can be increased by ramping up the inputs. It will also show how it is possible to land a critique despite the fact that science’s self-understanding includes its own imperfection.

Along the way, we’ll get a chance to discuss Damascus steel, Kuhnian paradigms, and what Freud’s actual problem was. (It wasn’t sex.)

I’ll end with some proposals for how we could reshape research to make it more efficient and productive, and then a serious, optimistic argument on the topic of how many discoveries there are left to make.

Next essay: How to Understand Research Productivity