CI (Confidence Intervals) can do more harm than good

In academia, you have a certain pressure to present / publish your results with some indication of the confidence you have in them, and often you do that by indicating the Confidence Interval, that is the range you claim that the true value of what you are estimating is actually within, given a certain probability (…almost, frequentists please forgive me).

Now, the problem is that it is really, really hard you can account for _all_ sort of variability in the process that you are studying. So at the end what you claim confidence in, is just a little subset of the variability and by putting CI you are deceiving your target to be more confident than what they should actually be.

I have one example: I have a sister that love to consult weather sites… so she can collect the temperature for tomorrow for a given town in several sites and then come with an estimation and a CI. The problem is that most sites actually implement slightly different versions of the same algorithm given raw measures or raw estimations, if not tacking the data directly from the same source (generally, MeteoFrance here). So they all provide roughly the same prediction, and the CI you put is deceiving.

Mathematically the problem is that in setting the CI you assume that each observation is independent, when actually is not. This is why it is important to check and correct for correlation before reporting the confidence you are in your estimations.

Discussion

Real name:

E-Mail:

Enter your comment. Wiki syntax is allowed:

Please fill all the letters into the box to prove you're human. Please keep this field empty:

Subscribe to comments