Fact and Fiction: Everyone uses prior probabilities

There is a lot of rubbish about the Bayesian approach to inference being written in various blogs; to spare peoples' blushes I will not cite those blogs here. As I see it, the main error that people make is to assume that the Bayesian approach is guilty of being subjective because it uses prior probabilities, which seemingly have to be plucked out of thin air (i.e. subjectively).

Fortunately, or unfortunately, depending on your point of view, the Bayesian approach is not the only one that is subjective according to these criteria.

Bayes theorem allows joint probabilities to be split up into products of conditional probabilities and marginal probabilities. The simplest statement of Bayes theorem is

Pr(x, y) = Pr(x) Pr(y│x) = Pr(y) Pr(x│y)

where the marginal probabilities are defined as

Pr(x) ≡ Σ_yPr(x, y) and Pr(y) ≡ Σ_xPr(x, y)

This allows us to write the conditional probability Pr(x│y) as

Pr(x│y) = Pr(x)Pr(y│x) / (Σ_xPr(x)Pr(y│x)) = Pr(x) Pr(y│x) / Pr(y)

where the dummy x that is used inside the summation is different from the free x that occurs elsewhere.

So, in order to determine x given that you know y (i.e. Pr(x│y)), all you need to know is how to determine y given that you know x (i.e. Pr(y│x)) together with the prior probability of x (i.e. Pr(x)).

The criticism that the Bayesian approach is subjective arises from the Pr(x) term. Why should drawing an inference about x given y depend on this apparently subjective factor? If x is the value of a physical constant, and y is an experimental measurement of it, then why should the interpretation (i.e. Pr(x│y)) of this measurement apparently be subjective?

Firstly, there is an extreme case that we must dispose of. If the experimental data actually measures the quantity of interest with zero error (e.g. y = x) then the choice of Pr(x) has no effect. I am not talking about this extreme case. I am talking about the more realistic case where the data contains only partial information about the quantity of interest, because it is subject to noise, or maybe because it is a lower-dimensional projection of a higher-dimensional quantity.

Do some dimensional analysis (this is valid whether x and y are continuous or discrete):

Pr(x) and Pr(x│y) both have the dimensionality of 1/x.
Pr(y) and Pr(y│x) both have the dimensionality of 1/y.

Thus in Pr(x│y) = Pr(x) Pr(y│x) / Pr(y) the 1/x dimensionality of Pr(x│y) derives entirely from Pr(x), because Pr(y│x) / Pr(y) is dimensionless.

You have to use something like the dimensional Pr(x) factor in order to construct Pr(x│y). If this factor is not actually Pr(x) itself, perhaps because you don’t like to use apparently subjective quantities, then what else could it be? Because it has inverse linear dimensions it has to be physically like a density. What densities do we have lying around ready for use? If we imagine that x-space is composed of infinitesimally small x-cells, then the density of such cells has the required properties. How do we decide what a good choice for these cells might be? Do we make them all the same size when viewed in x-space, or exp(x)-space, or what? There is no uniquely obvious choice!

The problem of choosing a space in which to define the cell size, so that a density can be defined in order to give Pr(x│y) its dimensions, is the same as the problem of defining the Bayesian prior Pr(x). This is the reason why the Bayesian approach is not the only one in which you define a prior probability. Actually, in all approaches you have to define a prior probability, but only Bayesians use the term "prior probability", so they are totally honest about what they are actually doing.

You could define a density-like quantity in terms of the frequency of visits to each point in the space, which gives a number-per-unit-cell (i.e. a density). If you have an underlying model for generating points in x-space, then in principle it is easy to generate this type of density, and this would indeed be an objective way of defining a density. However, this ducks the issue of where the underlying model came from in the first place. There is a bootstrapping process, where you need to impose a density-like quantities on spaces that have never been visited before, and for which there is no agreed upon underlying model.

In short, everyone faces the problem of defining prior probabilities, but only Bayesians call them prior probabilities. It is disingenuous to use prior probabilities as a stick to beat Bayesians with. Unless, of course, you are a masochist, because everyone uses prior probabilities.

Fact and Fiction

Thursday, July 20, 2006

Everyone uses prior probabilities

0 Comments:

About Me

Previous Posts

Tags

Other Blogs