Photo by Sasun Bughdaryan on Unsplash

How not to use probabilities

Prosecutor fallacy and vaccine data

Filippo Valle
5 min readDec 7, 2021

--

A brief crime story first

We are in a city with 4 milion citizenships. In a tribunal in front to the jury there is John: John is accused of a murder.

The prosecutor goes in front of the jury and affirm “We have a match of the the DNA on the crime scene” and then continues “the chances of founding a match by chance are 1 in a million, John must be guilty”.

All of this seems reasonable and the jury is almost convinced to send John to jail, but…

The lawyer intervene…you told us that this test has a match 1 in a million, but we are in this city and the population is around 4 million, so if we could test all the people we will find at least 4 positive matches. There are 4 people with a DNA that matches the one on the crime scene, any of them could be the murderer.

The probability of finding John guilty are actually 1/4 or 25%, which is not zero, but much much lower than the 99,9999% the prosecutor was arguing.

Introduction

Every time someone has an hypothesis (i.e. John is guilty, a drug works, smoke causes cancer…) they should collect evidences that the hypotesis has to be accepted or rejected.

The image below shows a simple example of this fact.

Source Wikipedia.

As evident in the picture estimating the probability of observing an evidence when the hypothesis is true P(evidence | hypothesis) (the idea of the prosecutor) is completely different than estimating the probability that the hypothesis is true once the evidence has been observed P(hypothesis | evidence).

Example

If we have a drug that can help the humanity during a pandemic we can define our hypothesis as: the drug does not work.

The evidence will be measured by the number of people recovered in an Intensive Care Unit, for instance.

It is more than important to give the correct information about the evidences observed and when they were observed.

A case with vaccines

Let’s try to apply the same reasoning on the vaccines. We are in a small town of 50 people and there is a pandemic ongoing.

A population of 50 people. Image by author.

A journalist goes to an hospital and found that there are for instance 10 people recovered. They investigated more and found that half of them are vaccinated and half of them are not. The first impression, (the one of the fallace prosecutor) would be that the chance of die (or being recovered or be in an ICU) is the same for vaccinated and unvaccinated people.

A population of 50 people. 5 of them died unvaccinated, 5 of them died vaccinated. The probability of finding a vaccinated person among the died one is 50%. Image by author.

There is something missing anyway: until now nobody knows how many people are vaccinated. The information given so far is that if one goes in a hospital the probability to come across a vaccinated or unvaccinated people is the similar, but this is not the probability of going to hospital or die if you are (or are not) vaccinated.

In fact, in our small town 80% of people are vaccinated and 20% are not. The situation is illustrated below.

A population of 50 people. 40 of them are vaccinated, 10 are not. 5 of them died unvaccinated, 5 of them died vaccinated. 12,5% of vaccinated people died. 50% of non vaccinated people died. Image by author.

Knowing the number of vaccinated people and the number of unvaccinated people together with the hospital’s occupation we can estimate what really matters: what is the probability of dying if I am vaccinated and what is the probability of dying if I am not?

The probability of dying being vaccinated in our case-study example is 5 over 40 = 12,5% and the probability of dying for unvaccinated people is 5 over 10 = 50% four times higher!

The real world case

This is an example of an italian regional government infographics, they communicated that 2/3 of the people recovered in ICU are not vaccinated.

(they also violated the Proportional ink rule discussed here and here, but this will be another story)

Infographics from Italian regional government. Source Twitter (previous week also on Twitter, also more recent ones have been tweeted).

In Piedmont, Italy there are more than 4 milion people on December 2021 more than 3.3 million were fully vaccinated the institution communicated that among the 34 people recovered in ICU 9 were vaccinated and 25 were not.

Falling into the trap described before the message was that almost 2/3 of recovered people is not vaccinated. The important data that the 75% of the whole population was fully vaccinated is completely ignored here. In fact 2/3 is the probability of bumping into a non vaccinated people when visiting an hospital but it is not the probability of get sick and go to the hospital requiring care.

In this case 9 over ~ 3.3 million of vaccinated people needed a Intensive Care Unit that represent the 2,7 in a million.

On the other hand 25 people over ~1.1 million of non vaccinated people needed a Intensive Care Unit that correspond to 22,7 in a million.

This means that it is 8x times more probable to be recovered if you are not fully vaccinated, in this part of Italy. This is a completely different (and accidentally more encouraging and fair) message to tell people.

Conclusions

Please be cautious when you read stuff online even when the information come from official sources. And try to have always the full picture, this will, literally, save your life.

Sources

The crime story at the beginning of this article comes from this video

Lecture from Calling Bullshit course by Carl T. Bergstrom and Jevin West. Source Youtube.

Data come from this link (updated weekly)

Some discussion about prosecutor fallacy is on Wikipedia

--

--

Filippo Valle

Interested in physics, ML application, community detection and coding. I have a Ph.D. in Complex Systems for Life Sciences