Sunday, March 17, 2013

Reporting Artifact

Every so often, I use what I consider a basic term in conversation or online and come across complete ignorance that the term even exists on the part of my audience. I will here attempt to define ‘reporting artifact’ for future reference.

One often sees in the news a report that ‘twice as many cases of cancer have been diagnosed as last year’. The natural conclusion is that twice as many people got cancer. This is, in fact, a conclusion COMPLETELY unsupported by the data presented. One way that it may be a false conclusion (there are several ways) is due to the possibility of a reporting artifact. In this specific case, suppose ten times as many people were screened for cancer as last year. Would the doubling of diagnoses in a such a case indicate an increase in cancer? Quite the opposite – that would indicate that the cancer is one fifth the prior year’s level. That is a reporting artifact. Broadly stated, if you find something more only because you LOOKED for it a LOT more, it is a reporting artifact. The flip side is too: if you stop looking for something and stop finding it that, again, is a reporting artifact.

Criminal statistics are particularly subject to reporting artifacts. The reason is quite simple: people do not always report crimes. Certain crimes are more likely to be reported than others; homicide, for example, produces a body which will usually be found at some point. Robbery produces no such gross physical evidence. You can not PROVE that a homicide has NOT occurred (unless you can produce everyone who might have been murdered), but the fact that bodies tend to turn up eventually and that people wonder what happened to the victim means we can reasonably conclude that the fact that a murder has happened becomes part of a statistical set most of the time. Not so with robbery – even if you could demonstrate that everything is where it should be that doesn’t prove there weren’t TWO robberies; the second reversing the perpetrator and victim of the first.

The situation is aggravated by the fact that most crimes go unsolved (especially minor ones) and reporting a crime has negative consequences even for the victim. If nothing else, paperwork and lost time. This produces a negative incentive to report a minor crime; if someone steals a small amount of your money you might be better off working to make more money rather than spending the same amount of time working with the police to solve the crime. The crime might never be solved, and even if it is you might not get your money back.

Rape is often recognized as an under-reported crime despite being much more serious. Some of the same factors are in play; what has been lost cannot be recovered, even if the assailant is found the “I say/you say” dynamic frequently prevents a conviction, and even a conviction provides no guarantee that the assailant will not rape someone else in the future. Toss in various cultural stigmata against the victim and the fact that to report the event they must discuss taboo issues with strangers. Yes, modern US culture has taboos.

Attempts to compare rape statistics are thus subject to a very high likelihood of reporting artifacts.

As noted above, disease and injury are another common place where reporting artifacts occur. Just about everyone who has a heart attack shows up in statistics somewhere (or so we can reasonably assume; again, we can’t prove it all that well). Just about anyone, however, knows someone who got a papercut. Most of them do not show up in statistics (ER visits, EMS calls, etc., though some people do in fact seek advanced medical treatment for papercuts – boy do I wish I was making that up). So if we have a statistic that shows that there are more reported heart attacks than paper cuts, does that mean that heart attacks are more likely? Probably not.

If we posted a reward for reporting paper cuts we would introduce yet another reporting artifact; people being people, someone will deliberately give themselves a paper cut in order to get the reward. If we are trying to collect data on accidental papercuts we have just distorted our data set.

Even polling can’t eliminate reporting artifacts; people will lie, or forget, or misunderstand what is being asked. So if we poll people about their papercuts, some will forget how many they’ve gotten, some will deny they went to the hospital, and some will assume that we were asking about all accidental cuts, not just ones from paper. Note this is for something with minimal emotional impact for most people. An issue with political or emotional overtones will usually be worse (gunshots, rape, etc.).

The root of a reporting artifact is that what happens is different from what is observed, and what is observed is different from what is recorded.

Reporting artifacts can be identified without allowing meaningful analysis, especially if multiple artifacts may be at work. Cracks in airplane wings happen. People often, but not always, find them. When they find them they usually, but not always, report them. If a regulatory agency tells operators to go look for specific cracks in specific places they are more likely to find those cracks. Big cracks are more likely to be reported than small cracks. 1st world airlines are more likely to find cracks than 3rd world airlines. 3rd world airlines are more likely to have cracks (older airplanes) than 1st world airlines. If a database lists ten times as many cracks in 1st world airplanes is it because 3rd world airlines aren’t finding them, or are finding them but not reporting them? Can we know the actual ratio of cracks? Can we prove that the 3rd world airlines have any significant number of unreported cracks? The last is possible, though only with extreme care could such information be used to determine a ‘real’ ratio.

That is the real problem of reporting artifacts: even if you can prove they exist, it is far harder to adjust for them.

Let all those who compare statistics beware.

No comments: