Bayes Breaks Down.
In my class called "Developments in Statistical Methods", I sometimes feel like Harry Potter secretly practicing military magic. We are studying Bayesian methods, which are spoken of only in a whisper by established LSE econometericians. "No Bayesians here," they hiss.
In a word, Bayesian statistics allows -- indeed, requires -- you to make explicit your prior beliefs about the parameter you're trying to estimate. While Frequentists believe in one true data generating parameter that you get closer to with more data, Bayesians have a range of beliefs about the parameters that is continually updated as you get more data. As the amount of data becomes infinite, the two methods are usually identical. With small samples, the results can differ markedly. In fact, questions with very small data sets, like a single observation, are only addressable by Bayesian methods. Frequentists throw up their hands, saying that the notion of "probability" has no meaning for a single observation.
Consider this question. I show you two apparently identical envelopes. I tell you that both have some money in them, and that one has twice as much money in it as the other, but you cannot tell which just by looking at them. You then get to choose one envelope and open it, and in it you see £12. I offer to let you keep the twelve pounds, or take whatever amount is in the other envelope. You want to maximize your expected take. Which envelope should you pick?
Here is where you should think about it if you want to before reading further.
First, notice that there is an obvious correct answer. If you have absolutely no information about how much I'm likely to have put in the envelopes, it should not make any difference which envelope you pick. Any probibalistic framework that does not give you this answer is fooling you somewhere and failing to do what probability theory is designed to do -- that is, describe and quantify your uncertainty.
But how do you characterize what you don't know in this case? It is natural to say that there is a 50% chance that the other envelope contains twice as much and a 50% chance that it contains half as much. But then the expected value of switching is 0.5*24+0.5*6=12+3=15. 15 is larger than 12, so you should take the other envelope, since the expected value is higher.
Naturally, a Frequentist approach is no use here. So let's look at it from a Bayesian point of view. Suppose you have a prior belief about the distribution of the amount in the smaller envelope which I will call p(x), and an independent prior assigning 50% probability to either envelope having the higher amount of money. Having observed the amount in one envelope to be y, the posterior probability, q, of the money in the other envelope is given by
q(y/2)=p(y/2)/(p(y)+p(y/2))
q(2y)=p(y)/(p(y)+p(y/2))
q is zero otherwise
(To see this, note that the other envelope must contain either y/2 or 2y. Then the numerator is proportional to the prior probability that that the smaller envelope contained the value implied by the second observation, and the denominator just normalizes so the probabilities sum to one.)
Notice that whether or not you choose to switch depends on your prior beliefs about the distribution of the money in the envelopes. So if you want the first observation to be uninformative, that should mean that no matter what y you observe,
q(y/2)=q(2y) =>
p(y/2)=p(y) =>
p is constant
This is the assumption we made implicitly when we did the above expectation calculation. But notice that a constant over the entire domain does not integrate to one -- that is, this prior is not a valid distribution function. In fact, you cannot have a valid prior that manages to capture the true uncertainty here, which is that you really have no information about the second envelope having seen the first.
I really like this paradox. It brings to light two deep (in my humble opinion) ideas: probability theory's main role is to characterize what you do not know, and the way you characterize it may affect your results in unexpected ways.

2 Comments:
Hi! Just discovered your blog.
Nice thought experiment. What I find intriguing is something that both frequentists and baysians, in fact any empiricist must find disturbing. You see data (i.e. you know what's in 1 of the 2 envelopes), and it does not contain any information.
Well, it contains a lot of information in one sense: it reduces your density of the money in the envelope from a continuous distribution on R to a discrete distribution nonzero at only two points. But it's true that it does not shed any light on your decision. It is informative in an very uninformative way.
Post a Comment
Links to this post:
Create a Link
<< Home