### Everyone uses prior probabilities

There is a lot of rubbish about the Bayesian approach to inference being written in various blogs; to spare peoples' blushes I will not cite those blogs here. As I see it, the main error that people make is to assume that the Bayesian approach is guilty of being *subjective *because it uses *prior *probabilities, which seemingly have to be plucked out of thin air (i.e. subjectively).

Fortunately, or *un*fortunately, depending on your point of view, the Bayesian approach is *not *the only one that is subjective according to these criteria.

Bayes theorem allows joint probabilities to be split up into products of conditional probabilities and marginal probabilities. The simplest statement of Bayes theorem is

Pr(*x*, *y*) = Pr(*x*) Pr(*y*│*x*) = Pr(*y*) Pr(*x*│*y*)

where the marginal probabilities are defined as

Pr(*x*) ≡ Σ* _{y}*Pr(

*x*,

*y*) and Pr(

*y*) ≡ Σ

*Pr(*

_{x}*x*,

*y*)

This allows us to write the conditional probability Pr(

*x*│

*y*) as

Pr(

*x*│

*y*) = Pr(

*x*)Pr(

*y*│

*x*) / (Σ

*Pr(*

_{x}*x*)Pr(

*y*│

*x*)) = Pr(

*x*) Pr(

*y*│

*x*) / Pr(

*y*)

where the dummy *x* that is used inside the summation is *different* from the free *x* that occurs elsewhere.

So, in order to determine *x *given that you know *y *(i.e. Pr(*x*│*y*)), all you need to know is how to determine *y *given that you know *x *(i.e. Pr(*y*│*x*)) *together with *the prior probability of *x *(i.e. Pr(*x*)).

The criticism that the Bayesian approach is *subjective* arises from the Pr(*x*) term. Why should drawing an inference about *x *given *y *depend on this apparently subjective factor? If *x *is the value of a physical constant, and *y *is an experimental measurement of it, then why should the *interpretation *(i.e. Pr(*x*│*y*)) of this measurement apparently be subjective?

Firstly, there is an extreme case that we must dispose of. If the experimental data actually measures the quantity of interest with zero error (e.g. *y *= *x*) then the choice of Pr(*x*) has no effect. I am *not *talking about this extreme case. I am talking about the more realistic case where the data contains only *partial* information about the quantity of interest, because it is subject to noise, or maybe because it is a lower-dimensional projection of a higher-dimensional quantity.

Do some dimensional analysis (this is valid whether *x *and *y *are continuous or discrete):

- Pr(
*x*) and Pr(*x*│*y*) both have the dimensionality of 1/*x*. - Pr(
*y*) and Pr(*y*│*x*) both have the dimensionality of 1/*y*.

Thus in Pr(*x*│*y*) = Pr(*x*) Pr(*y*│*x*) / Pr(*y*) the 1/*x *dimensionality of Pr(*x*│*y*) derives *entirely *from Pr(*x*), because Pr(*y*│*x*) / Pr(*y*) is dimensionless.

You *have *to use something like the dimensional Pr(*x*) factor in order to construct Pr(*x*│*y*). If this factor is *not *actually Pr(*x*) itself, perhaps because you don’t like to use apparently subjective quantities, then what else could it be? Because it has *inverse linear* dimensions it has to be physically like a *density*. What densities do we have lying around ready for use? If we imagine that *x*-space is composed of infinitesimally small *x*-cells, then the density of such cells has the required properties. How do we decide what a good choice for these cells might be? Do we make them all the same size when viewed in *x*-space, or exp(*x*)-space, or what? There is no uniquely obvious choice!

The problem of choosing a space in which to define the cell size, so that a density can be defined in order to give Pr(*x*│*y*) its dimensions, is the *same *as the problem of defining the Bayesian prior Pr(*x*). This is the reason why the Bayesian approach is *not* the only one in which you define a prior probability. Actually, in *all* approaches you have to define a prior probability, but only Bayesians use the term "prior probability", so they are totally honest about what they are actually doing.

You *could* define a density-like quantity in terms of the *frequency* of visits to each point in the space, which gives a number-per-unit-cell (i.e. a density). If you have an underlying model for generating points in *x*-space, then in principle it is easy to generate this type of density, and this would indeed be an objective way of defining a density. However, this *ducks the issue* of where the underlying model came from *in the first place*. There is a bootstrapping process, where you need to impose a density-like quantities on spaces that have never been visited before, and for which there is no agreed upon underlying model.

In short, *everyone* faces the problem of defining prior probabilities, but only Bayesians call them prior probabilities. It is disingenuous to use prior probabilities as a stick to beat Bayesians with. Unless, of course, you are a masochist, because *everyone* uses prior probabilities.

## 0 Comments:

Post a Comment

<< Home