### Bayesian probability (update)

Blimey! Look ye here! The lady doth protest too much, methinks. I will try to respond succinctly.

I will continue to write in a fairly informal style in this blog, and to point to the relevant literature for the more discerning readers. I made this decision to embrace a wider readership at the cost of annoying a few readers. I can see why one might uncharitably compare this style of writing to that of postmodern literature criticism, but I will just have to live with that. I have a more "scholarly" (in places) blog here, but the blog is currently dormant because of problems with uploading images.

My main interest here is how we do inference

*about*complicated systems consisting of many interacting parts. Note that there are two levels here: the system's behaviour

*itself*(e.g. physics) and the reasoning

*about*the system (e.g. inference). I am mainly talking about the second of these two levels. Note that this second level is where we all operate when we reason about "the world", because all we are doing is manipulating knowledge

*about*things, rather than manipulating the things

*themselves*.

The above paragraph must sound rather postmodern, but it's not! See the next paragraph.

Let's start by citing the literature yet again: Cox R T, "Probability, frequency and reasonable expectation",

*Am. J. Phys.*, 1946,

**14**(1), 1-13. This paper gives a neat derivation of the Bayesian approach to inference, by deriving everything from some elementary axioms which demand that the inference process must be internally

*consistent*. Loosely speaking, if

**is the state of a system, and Pr(**

*x***) is the joint probability of the components of**

*x***, then all inferences can be done by Bayesian manipulations of Pr(**

*x***)**

*x**not*

**.**

*x*Bayesian inference is about

*manipulating*joint probabilities (i.e. inference) rather than about

*defining*joint probabilities in the first place (i.e. prior probabilities). These

*prior*probabilities can be constructed in any way that you please, as long as they satisfy the usual properties (i.e. non-negativity, summing to unity), and then the Bayesian inference machinery can make use of them.

The freedom to choose a prior probability is an advantage,

*not*a disadvantage, because it allows you freedom in your choice of model (or ensemble of alternative models). Bayesian inference then uses any relevant data to convert this

*prior*probability into a

*posterior*probability, which effectively updates the model (or ensemble of alternative models) in the light of the data.

Here is my original posting on the anthropic principle and Bayesian inference, so you can see for yourself how it has been selectively quoted here. In particular, check out the penultimate paragraph (the one starting

*"If the properties of the universe are correctly described by string theory (this may indeed be the case)..."*) for what I say about science, philosophy, and string theorists.

One can

*aspire*to relate one's

*conjectured*scientific theory to the real world, but the longer you are unable to demonstrate a strong connection between the two, the more your activity can be credibly labelled as "philosophy" rather than "science". I too would

*like*to see the laws of physics derived from first principles, but I would

*not*go so far as to assert that the laws of physics

*had*to be derivable in this way. Whether we like it or not, the landscape is still a logical possibility, and we should at least be aware of what science would look like in that type of universe.

I also live dangerously in my own research activities, where I follow up some fairly wild ideas for long periods of time, but I always have "bread and butter" threads of research running alongside, where I dive for cover when the going gets tough. I never put all of my eggs in one basket no matter how elegant (or even how promising) it looks.

## 4 Comments:

Dear Steve,

I have read the Cox paper and other sources, be sure about it. Deriving something from some axioms does not yet make the axioms relevant for science.

The prior probability itself is a source of problems. There is no general way how to calculate it, and it affects all the results.

These results are simply subjective. Why don't you just admit that we don't know something every time you need to use the Bayesian inference and uncalculable (non-frequentist) prior probabilities?

The probabilities you derive from these shaky starting points only have a value to tell you what bets you want to make, but they have no scientific value. They also have no financial value: In average, you will lose 50% of the bets, especially if others are using the same Bayesian reasoning.

The same comment applies to your eggs in different baskets. It's just your personal nature. There is nothing correct about putting eggs in different baskets. To try to learn means to attempt to figure out which basket exactly one should choose to put all her eggs into. And we have answered millions of such questions already - we have chosen the correct baskets - and we are looking for many more correct baskets.

Putting them in different baskets simply means that you do not know, and I find it irritating if someone promotes ignorance to a scientific principle that should be treated on par with standard science.

Ignorance is not yet science. If someone does not know whether there is a consistent N=1 SUGRA in ten dimensions with a gauge group whose dimension differs from 496, he may say "smart" things that it is exists with probability 31.415926 percent - but it does not change anything about the fact that he has no idea what he is talking about.

Once again, ignorance is not science, not even if you put numbers around it.

Best

Luboš

Luboš

I'm sure you know most of what I'm about to say, but I'll say it anyway because I'm trying to clarify to myself exactly what you believe on this subject.

(Part of) the point of the Bayesian approach to probabilities is to get probabilities to do their proper job -- namely, the quantification of our ignorance and knowledge about determinate facts of the world. When someone says "X exists with probability 34%", he doesn't mean (or perhaps I should say

shouldn'tmean) that its actual existence is an indeterminate matter (like the x-spin of an electron in the z-up state). The existence of X is a determinate fact -- as you say, it is either 100% true, or 0% true. But in almost every case of interest we don't have perfect knowledge of the relevant parts of the world, and we therefore need to quantify how much our evidence warrants confidence in a fact. That's what probabilities are for (on the Bayesian view), and the Bayesian method describes how to express these 'quantified statements of ignorance' in a consistent manner.You seem to think (although I may be misinterpreting you) that this isn't good enough, because probabilities ought to be

realobjective things. On the strongest version of the Bayesian view (see e.g. Jaynes), this idea that probabilities are real things that live 'out there in the world' is just nonsense. There are, as you say, only (finite) frequencies 'out there in the world' to be observed and measured, but these thingsjust aren'tprobabilities. You know this already, of course -- even on the "frequentist" view, a probability is the infinite limit of a sequence of frequencies, which isn't something that genuinely exists in the world, or which we can ever observe.You're concerned about the "subjectivity" of Bayesian probabilities, and that's a fair concern. But only very weak kinds of Bayesianism are fully "subjective" (in the sense that their 'probabilities' encode merely 'what you believe' without any constraint from evidence). If that were the only kind of Bayesianism you were familiar with, I could see how you would dismiss it as utterly unscientific -- and I would agree.

But the kind of Bayesianism I'm talking about, the kind advocated by Ed Jaynes, for example, is not "subjective" in this sense. Yes, it is most naturally described in terms of "belief" and "plausibility", but at the real

coreof the theory (when extraneous elements have been stripped away) it's about the extent to which some set of propositions "partially entails" some other set of propositions. That is, it's (intended as) an objective relationship between propositions, an extention of deductive logic to intermediate degrees of entailment.I've gone on too long already, so I'll leave it here for now. I'm genuinely interested to know why you think this objective Bayesian view of Jaynes is still insufficient to play the role of probability in science. Perhaps you don't believe that the goal of 'probability as extended logic' can really be achieved? Or perhaps we have different views on what the role of probability in science is?

I look forward to seeing your response.

Luboš

(I wrote this reply before I saw what logopetria said above)

I agree that the assignment of a prior probability is a problem, and there has been a lot of work done on

objectiveways of assigning priors. For instance, see the slides on Objective Bayesian Analysis by James Berger.However, throughout my postings I have avoided the issue of prior probabilities, because I have said:

Bayesian inference is about manipulating joint probabilities (i.e. inference) rather than about defining joint probabilities in the first place (i.e. prior probabilities).Why do I do this? The problem of assigning a prior is the

sameas the problem of conjecturing a model (or ensemble of alternative models), so the arbitrariness of the prior is a very generic problem in science. This arbitrarinessbeginsto be lifted only when you introduce data to convert the prior probability into a posterior probability, or analogously to select a subset of models out of the prior ensemble of alternative models.Even if you start with the

wrongprior probability, it will eventually be overturned by the introduction of enough data, to eventually produce a good posterior probability (this argument is informal, but the gist is right). This is exactly the same as conjecturing a lousy ensemble of alternative models, collecting lots and lots of data, and then belatedly realising that only a small part of the original ensemble was a sensible conjecture in the first place.What distinguishes the Bayesian approach from other approaches is

notthe assignment of a prior (everybodyassigns priors, but onlyBayesianscall them "priors"), but rather it is the use of Bayes theorem to do inference by manipulating joint probabilities (as described in the Cox paper). This isnotthe way that the Bayesian approach is taught in textbooks, where they go on and on about priors, and generally give the impression that priors are thedefiningproperty of the Bayesian approach.I suspect that what you call "science" is the limiting case described above, where the amount (and quality) of data overwhelms the prior, so the choice of prior becomes irrelevant. OK, you

candefine "science" that way, and I think that would be a generally acceptable definition, but it would be restricted to limiting cases where there is lots of data.More generally, the amount (and quality) of data will be limited, so there will inevitably be uncertainties which have to be dealt with in a

systematicway. That is where the Bayesian approach gives you a uniquely rigorous and consistent framework (see the Cox paper) for representing and manipulating uncertain information, which is especially useful when the uncertainties are mutually dependent. The arbitrariness of the prior then makes itself felt, but remember that the "prior" is not a uniquely Bayesian issue, as explained above.This broader definition of "science" (i.e. including the Bayesian handling of uncertainties) is the one that I use.

Refreshing views on the Bayesian story:

http://www.u.arizona.edu/~shahar/exchanges/Posting%20on%20Bayesianism.doc

http://www.cc.gatech.edu/~isbell/reading/papers/wang.bayesianism.pdf

Post a Comment

<< Home