FYI: The moral hazard of quantitative social science: Causal identification, statistical inference, and policy - Statistical Modeling, Causal Inference, and Social Science
Published by 劉正山,
關於上週二老師所聊到的,小心因果推論的武斷語言,恰好這裡也在進行反思。The moral hazard of quantitative social science: Causal identification, statistical inference, and policy
A couple people pointed me to this article, "The Moral Hazard of Lifesaving Innovations: Naloxone Access, Opioid Abuse, and Crime," by Jennifer Doleac and Anita Mukherjee, which begins:
The United States is experiencing an epidemic of opioid abuse. In response, many states have increased access to Naloxone, a drug that can save lives when administered during an overdose. However, Naloxone access may unintentionally increase opioid abuse through two channels: (1) saving the lives of active drug users, who survive to continue abusing opioids, and (2) reducing the risk of death per use, thereby making riskier opioid use more appealing. . . . We exploit the staggered timing of Naloxone access laws to estimate the total effects of these laws. We find that broadening Naloxone access led to more opioid-related emergency room visits and more opioid-related theft, with no reduction in opioid-related mortality. . . . We also find suggestive evidence that broadening Naloxone access increased the use of fentanyl, a particularly potent opioid. . . .
I see three warning signs in the above abstract:
1. The bank-shot reasoning by which it’s argued that a lifesaving drug can actually make things worse. It could be, but I’m generally suspicious of arguments in which the second-order effect is more important than the first-order effect. This general issue has come up before.
2. The unintended-consequences thing, which often raises my hackles. In this case, "saving the lives of active drug users" is a plus, not a minus, right? And I assume it’s an anticipated and desired effect of the law. So it just seems wrong to call this "unintentional."
3. Picking and choosing of results. For example, "more opioid-related emergency room visits and more opioid-related theft, with no reduction in opioid-related mortality," but then, "We find the most detrimental effects in the Midwest, including a 14% increase in opioid-related mortality in that region." If there’s no reduction in opioid-related mortality nationwide, but an increase in the midwest, then there should be a decrease somewhere else, no?
I find it helpful when evaluating this sort of research to go back to the data. In this case the data are at the state-year level (although some of the state-level data seems to come from cities, for reasons that I don’t fully understand.) The treatment is at the state-month level, when a state implements a law that broadens Naloxone access. This appears to have happened in 39 states between 2013 and 2015, so we have N=39 cases. So I guess what I want to see, for each outcome, are a bunch of time series plots showing the data in all 50 states.
We don’t quite get that but we do get some summaries, for example:
The weird curvy lines are clearly the result of overfitting some sort of non-regularized curves; see here for more discussion of this issue. More to the point, if you take away the lines and the gray bands, I don’t see any patterns at all! Figure 4 just looks like a general positive trend, and figure 8 doesn’t look like anything at all. The discontinuity in the midwest is the big thing—this is the 14% increase mentioned in the abstract to the paper—but, just looking at the dots, I don’t see it.
I’m not saying the conclusions in the linked paper are wrong, but I don’t find the empirical results very compelling, especially given that they’re looking at changes over time, in a dataset where there may well be serious time trends.
On the particular issue of Nalaxone, one of my correspondents passes along a reaction from an addiction specialist whose "priors are exceedingly skeptical of this finding (it implies addicts think carefully about Naloxone ‘insurance’ before overdosing, or something)." My correspondent also writes:
Another colleague, who is pre-tenure, requested that I anonymize the message below, which increases my dismay over the whole situation. Somehow both sides have distracted from the paper’s quality by shifting the discussion to the tenor of the discourse, which gives the paper’s analytics a pass.
There’s an Atlantic article on the episode.
Of course there was an overreaction by the harm reduction folks, but if you spend 5 minutes talking to non-researchers in that community, you’d realize how much they are up against and why these econ papers are so troubling.
My main problem remains that their diff-in-diff has all the hallmarks of problematic pre-trends and yet this very basic point has escaped the discussion somehow.
There is a problem that researchers often think that an "identification strategy" (whether it be randomization, or instrumental variables, or regression discontinuity, or difference in difference) gives them watertight inference. An extreme example is discussed here. An amusing example of econ-centrism comes from this quote in the Atlantic article:
"Public-health people believe things that are not randomized are correlative," says Craig Garthwaite, a health economist at Northwestern University. "But [economists] have developed tools to make causal claims from nonrandomized data."
It’s not really about economics: causal inference from observational data comes up all the time in other social sciences and also in public health research.
Olga Khazan, the author of the Atlantic article, points out that much of the discussion of the paper has occurred on twitter. I hate twitter; it’s a medium that seems so well suited for thoughtless sloganeering. From one side, you have people emptily saying, "Submit it for peer review and I’ll read what comes from it"—as if peer review is so great. On the other side, you get replies like "This paper uses causal inference, my dude"—not seeming to recognize that ultimately this is an observational analysis and the causal inference doesn’t come for free. I’m not saying blogs are perfect, and you don’t have to tell me about problems with the peer review process. But twitter can bring out the worst in people.
P.S. One more thing: I wish the data were available. It would be easy, right? Just some ascii files with all the data, along with code for whatever models they fit and computations they performed. This comes up all the time, for almost every example we look at. It’s certainly not a problem specific to this particular paper; indeed, in my own work, too, our data are often not so easily accessible. It’s just a bad habit we all fall into, of not sharing our data. We—that is, social scientists in general, including me—should do a better job of this. If a topic is important enough that it merits media attention, if the work could perhaps affect policy, then the data should be available for all to see.
P.P.S. See also this news article by Alex Gertner that expresses skepticism regarding the above paper.
P.P.P.S. Richard Border writes:
After reading your post, I was overly curious how sensitive those discontinuous regression plots were and I extracted the data to check it out. Results are here in case you or your readers are interested.