Fact and Fiction: Bayesian probability (update)

Friday, January 06, 2006

Bayesian probability (update)

Blimey! Look ye here! The lady doth protest too much, methinks. I will try to respond succinctly.

I will continue to write in a fairly informal style in this blog, and to point to the relevant literature for the more discerning readers. I made this decision to embrace a wider readership at the cost of annoying a few readers. I can see why one might uncharitably compare this style of writing to that of postmodern literature criticism, but I will just have to live with that. I have a more "scholarly" (in places) blog here, but the blog is currently dormant because of problems with uploading images.

My main interest here is how we do inference about complicated systems consisting of many interacting parts. Note that there are two levels here: the system's behaviour itself (e.g. physics) and the reasoning about the system (e.g. inference). I am mainly talking about the second of these two levels. Note that this second level is where we all operate when we reason about "the world", because all we are doing is manipulating knowledge about things, rather than manipulating the things themselves.

The above paragraph must sound rather postmodern, but it's not! See the next paragraph.

Let's start by citing the literature yet again: Cox R T, "Probability, frequency and reasonable expectation", Am. J. Phys., 1946, 14(1), 1-13. This paper gives a neat derivation of the Bayesian approach to inference, by deriving everything from some elementary axioms which demand that the inference process must be internally consistent. Loosely speaking, if x is the state of a system, and Pr(x) is the joint probability of the components of x, then all inferences can be done by Bayesian manipulations of Pr(x) not x.

Bayesian inference is about manipulating joint probabilities (i.e. inference) rather than about defining joint probabilities in the first place (i.e. prior probabilities). These prior probabilities can be constructed in any way that you please, as long as they satisfy the usual properties (i.e. non-negativity, summing to unity), and then the Bayesian inference machinery can make use of them.

The freedom to choose a prior probability is an advantage, not a disadvantage, because it allows you freedom in your choice of model (or ensemble of alternative models). Bayesian inference then uses any relevant data to convert this prior probability into a posterior probability, which effectively updates the model (or ensemble of alternative models) in the light of the data.

Here is my original posting on the anthropic principle and Bayesian inference, so you can see for yourself how it has been selectively quoted here. In particular, check out the penultimate paragraph (the one starting "If the properties of the universe are correctly described by string theory (this may indeed be the case)...") for what I say about science, philosophy, and string theorists.

One can aspire to relate one's conjectured scientific theory to the real world, but the longer you are unable to demonstrate a strong connection between the two, the more your activity can be credibly labelled as "philosophy" rather than "science". I too would like to see the laws of physics derived from first principles, but I would not go so far as to assert that the laws of physics had to be derivable in this way. Whether we like it or not, the landscape is still a logical possibility, and we should at least be aware of what science would look like in that type of universe.

I also live dangerously in my own research activities, where I follow up some fairly wild ideas for long periods of time, but I always have "bread and butter" threads of research running alongside, where I dive for cover when the going gets tough. I never put all of my eggs in one basket no matter how elegant (or even how promising) it looks.

4 Comments:

At 6 January 2006 at 18:54, Luboš Motl said...: Dear Steve,

I have read the Cox paper and other sources, be sure about it. Deriving something from some axioms does not yet make the axioms relevant for science.

The prior probability itself is a source of problems. There is no general way how to calculate it, and it affects all the results.

These results are simply subjective. Why don't you just admit that we don't know something every time you need to use the Bayesian inference and uncalculable (non-frequentist) prior probabilities?

The probabilities you derive from these shaky starting points only have a value to tell you what bets you want to make, but they have no scientific value. They also have no financial value: In average, you will lose 50% of the bets, especially if others are using the same Bayesian reasoning.

The same comment applies to your eggs in different baskets. It's just your personal nature. There is nothing correct about putting eggs in different baskets. To try to learn means to attempt to figure out which basket exactly one should choose to put all her eggs into. And we have answered millions of such questions already - we have chosen the correct baskets - and we are looking for many more correct baskets.

Putting them in different baskets simply means that you do not know, and I find it irritating if someone promotes ignorance to a scientific principle that should be treated on par with standard science.

Ignorance is not yet science. If someone does not know whether there is a consistent N=1 SUGRA in ten dimensions with a gauge group whose dimension differs from 496, he may say "smart" things that it is exists with probability 31.415926 percent - but it does not change anything about the fact that he has no idea what he is talking about.

Once again, ignorance is not science, not even if you put numbers around it.

Best
Luboš
At 7 January 2006 at 08:43, logopetria said...: Luboš

I'm sure you know most of what I'm about to say, but I'll say it anyway because I'm trying to clarify to myself exactly what you believe on this subject.

(Part of) the point of the Bayesian approach to probabilities is to get probabilities to do their proper job -- namely, the quantification of our ignorance and knowledge about determinate facts of the world. When someone says "X exists with probability 34%", he doesn't mean (or perhaps I should say shouldn't mean) that its actual existence is an indeterminate matter (like the x-spin of an electron in the z-up state). The existence of X is a determinate fact -- as you say, it is either 100% true, or 0% true. But in almost every case of interest we don't have perfect knowledge of the relevant parts of the world, and we therefore need to quantify how much our evidence warrants confidence in a fact. That's what probabilities are for (on the Bayesian view), and the Bayesian method describes how to express these 'quantified statements of ignorance' in a consistent manner.

You seem to think (although I may be misinterpreting you) that this isn't good enough, because probabilities ought to be real objective things. On the strongest version of the Bayesian view (see e.g. Jaynes), this idea that probabilities are real things that live 'out there in the world' is just nonsense. There are, as you say, only (finite) frequencies 'out there in the world' to be observed and measured, but these things just aren't probabilities. You know this already, of course -- even on the "frequentist" view, a probability is the infinite limit of a sequence of frequencies, which isn't something that genuinely exists in the world, or which we can ever observe.

You're concerned about the "subjectivity" of Bayesian probabilities, and that's a fair concern. But only very weak kinds of Bayesianism are fully "subjective" (in the sense that their 'probabilities' encode merely 'what you believe' without any constraint from evidence). If that were the only kind of Bayesianism you were familiar with, I could see how you would dismiss it as utterly unscientific -- and I would agree.

But the kind of Bayesianism I'm talking about, the kind advocated by Ed Jaynes, for example, is not "subjective" in this sense. Yes, it is most naturally described in terms of "belief" and "plausibility", but at the real core of the theory (when extraneous elements have been stripped away) it's about the extent to which some set of propositions "partially entails" some other set of propositions. That is, it's (intended as) an objective relationship between propositions, an extention of deductive logic to intermediate degrees of entailment.

I've gone on too long already, so I'll leave it here for now. I'm genuinely interested to know why you think this objective Bayesian view of Jaynes is still insufficient to play the role of probability in science. Perhaps you don't believe that the goal of 'probability as extended logic' can really be achieved? Or perhaps we have different views on what the role of probability in science is?

I look forward to seeing your response.
At 7 January 2006 at 09:25, Stephen Luttrell said...: Luboš

(I wrote this reply before I saw what logopetria said above)

I agree that the assignment of a prior probability is a problem, and there has been a lot of work done on objective ways of assigning priors. For instance, see the slides on Objective Bayesian Analysis by James Berger.

However, throughout my postings I have avoided the issue of prior probabilities, because I have said:

Bayesian inference is about manipulating joint probabilities (i.e. inference) rather than about defining joint probabilities in the first place (i.e. prior probabilities).

Why do I do this? The problem of assigning a prior is the same as the problem of conjecturing a model (or ensemble of alternative models), so the arbitrariness of the prior is a very generic problem in science. This arbitrariness begins to be lifted only when you introduce data to convert the prior probability into a posterior probability, or analogously to select a subset of models out of the prior ensemble of alternative models.

Even if you start with the wrong prior probability, it will eventually be overturned by the introduction of enough data, to eventually produce a good posterior probability (this argument is informal, but the gist is right). This is exactly the same as conjecturing a lousy ensemble of alternative models, collecting lots and lots of data, and then belatedly realising that only a small part of the original ensemble was a sensible conjecture in the first place.

What distinguishes the Bayesian approach from other approaches is not the assignment of a prior (everybody assigns priors, but only Bayesians call them "priors"), but rather it is the use of Bayes theorem to do inference by manipulating joint probabilities (as described in the Cox paper). This is not the way that the Bayesian approach is taught in textbooks, where they go on and on about priors, and generally give the impression that priors are the defining property of the Bayesian approach.

I suspect that what you call "science" is the limiting case described above, where the amount (and quality) of data overwhelms the prior, so the choice of prior becomes irrelevant. OK, you can define "science" that way, and I think that would be a generally acceptable definition, but it would be restricted to limiting cases where there is lots of data.

More generally, the amount (and quality) of data will be limited, so there will inevitably be uncertainties which have to be dealt with in a systematic way. That is where the Bayesian approach gives you a uniquely rigorous and consistent framework (see the Cox paper) for representing and manipulating uncertain information, which is especially useful when the uncertainties are mutually dependent. The arbitrariness of the prior then makes itself felt, but remember that the "prior" is not a uniquely Bayesian issue, as explained above.

This broader definition of "science" (i.e. including the Bayesian handling of uncertainties) is the one that I use.
At 8 October 2008 at 19:11, Anonymous said...: Refreshing views on the Bayesian story:

http://www.u.arizona.edu/~shahar/exchanges/Posting%20on%20Bayesianism.doc

http://www.cc.gatech.edu/~isbell/reading/papers/wang.bayesianism.pdf

Fact and Fiction

Friday, January 06, 2006

Bayesian probability (update)

4 Comments:

About Me

Previous Posts

Tags

Other Blogs