Fact and Fiction

Thoughts about a funny old world, and what is real, and what is not. Comments are welcome, but please keep them on topic.

Tuesday, January 30, 2007

Marvin Minsky bashes neuroscience

From KurzweilAI.net I learn that Marvin Minsky has given an interview to Discover magazine here. Minsky is one of the pioneers of artificial intelligence, and he is a very articulate and outspoken character. In the interview he comments on the activities of neuroscientists.

Q (Discover). Neuroscientists' quest to understand consciousness is a hot topic right now, yet you often pose things via psychology, which seems to be taken less seriously. Are you behind the curve?

A (Minsky). I don't see neuroscience as serious. What they have are nutty little theories, and they do elaborate experiments to confirm them and don't know what to do if they don't work. This book [The Emotion Machine] presents a very elaborate theory of consciousness. Consciousness is a word that confuses possibly 16 different processes. Most neurologists think everything is either conscious or not. But even Freud had several grades of consciousness. When you talk to neuroscientists, they seem so unsophisticated; they major in biology and know about potassium and calcium channels, but they don't have sophisticated psychological ideas. Neuroscientists should be asking: What phenomenon should I try to explain? Can I make a theory of it? Then, can I design an experiment to see if one of those theories is better than the others? If you don't have two theories, then you can't do an experiment. And they usually don't even have one.

I'm sure the activities of neuroscientists are well-intentioned, as they adopt a reductionist approach to the analysis of a highly complex system (i.e. the brain) by working upwards from the detailed behaviour of individual neurons. However, neuroscientists' theorising about AI is bound to be wildly off-target, since AI lives at a much higher level than the relatively low level where they are working. Tracing the detailed neural circuitry of small parts of the brain (or even the entire brain) will not lead to AI; discovering the underlying principles of AI (whatever those turn out to be) will lead to AI, and it will not necessarily need biological neurons to "live" in.

In the early 1980's I jumped on the "neural network" bandwagon that had restarted around that time. There was a lot of hype back then that this was the rigorous answer to understanding how the brain worked, and it took me a few years to convince myself that this claim was rubbish; the "neural network" bandwagon was based solely on some neat mathematical tricks that emerged around that time (e.g. back-propagation for training multi-layer networks, etc), rather than better insight into information processing or even AI. My rather belated response was to "rebadge" my research programme by avoiding use of the phrase "neural networks", and instead using phrases like "adaptive networks" and the like; I wasn't alone in using this tactical response.

Q (Discover). So as you see it, artificial intelligence is the lens through which to look at the mind and unlock the secrets of how it works?

A (Minsky). Yes, through the lens of building a simulation. If a theory is very simple, you can use mathematics to predict what it'll do. If it's very complicated, you have to do a simulation. It seems to me that for anything as complicated as the mind or brain, the only way to test a theory is to simulate it and see what it does. One problem is that often researchers won't tell us what a simulation didn't do. Right now the most popular approach in artificial intelligence is making probabilistic models. The researchers say, "Oh, we got our machine to recognize handwritten characters with a reliability of 79 percent." They don't tell us what didn't work.

This caricature of the cargo-cult science that passes itself off as genuine science made me laugh. As it happens, I use (a variant of) the probabilistic models that Minsky alludes to, and I find the literature on the subject unbelievably frustrating to read. A typical paper will contain an introduction, some theory, a computer simulation to illustrate an application of the theory, and a pathetically inadequate interpretation of what it all means. The most important part of a paper (the "take home message", if you wish) is the interpretation of the results that it reports; this comprises the new conceptual tools that I want to take away with me to apply elsewhere. Unfortunately, the emphasis is usually on presenting results from a wide variety of computer simulations and comparisons with competing techniques, which certainly fills up the journal pages, but it doesn't do much to advance our understanding of what is going on.

Where are the conceptual tools? This is like doing "butterfly" collecting rather than doing science. We need some rigorous organisational principles to help us gain a better understanding of our large collection of "butterflies", rather than taking the easy option of simply catching more "butterflies".

It seems to me that the situation in AI is analogous to, but much more difficult than, the situation in high energy physics during the 1950's and 1960's, when the "zoo" of strongly interacting particles grew to alarming proportions, and we explained what was going on only when the eightfold way and the quark model of hadrons were proposed. I wonder if there are elementary degrees of freedom underlying AI that are analogous to the quark (and gluon) DOF in hadrons.

I'll bet that the "elementary" DOF of AI involve the complicated (strong?) mutual interaction of many neurons, just as the "elementary" DOF in strong interactions are not actually elementary quarks but are composite entities built out of quarks (and gluons). I'll also bet that we won't guess what the "elementary" DOF of AI are by observing the behaviour of individual neurons (or even small sets of neurons), but we will postdict (rather than predict) these DOF after someone (luckily) observes interesting information processing happening in the collective behavour of large sets of neurons, or if someone (even more luckily) has a deep insight into the theory of information processing in large networks of interacting processing units.

10 Comments:

At 9 February 2007 at 19:33, Blogger marvin Minsky said...

Thanks, Steve. You really understood what I was getting at!

Unlike most interviews, the Discover editor didn;'t let me read what they would print. (I regret especially, that sentence with 'nutty' in it—extracted from what must have been a half-hour of interview text.)

Marvin

 
At 9 February 2007 at 22:44, Blogger Stephen Luttrell said...

Thanks for your support. It's refreshing not to called a crackpot when I dissent from the "party line".

 
At 10 February 2007 at 03:43, Blogger marvin Minsky said...

I like your notion that one should "postdict" instead of predict. If what you alre looking at is complex, you cannot get far by just looking at data; , you need to first make a hypothesis. (In fact, you really need to make 2 or 3 -- and then design an experiment that will help to distinguish them.)

In particular, I also agree that one cannot get far by just looking at neurons, because they are almost the same in both lobsters and people.

More likely the best place to look is at somewhat higher levels of groups of cells (such as small sets of cortical columns) because these must have been important steps toward making us able to do serial processes, but insulating higher-level events from the properties of those ancient chemicals. (Unfortunately, for this, MRIs are still too blurry to help.)

 
At 11 February 2007 at 14:16, Blogger Stephen Luttrell said...

The nearest analogy in physics is the idea of an "effective theory", where the degrees of freedom are not the underlying fundamental DOF, but are chosen to be better suited to real-world phenomena. They are macroscopic rather than microscopic DOF. The analogy I used with microscopic quarks+gluons giving rise to macroscopic hadrons was an example of this.

Robert Laughlin has written a very readable book on effective theories called A Different Universe: Reinventing Physics from the Bottom Down; it's not written for experts, but I find his style entertaining.

In my work on information processing I have tried to find simple algorithms (i.e. behaviour of elementary DOF) that have useful emergent properties (i.e. behaviour of emergent DOF), such as discovering subtle ways of encoding low-level data DOF to automatically extract useful high-level DOF.

When I apply this approach to correlated images from a pair of sensors I get ocular dominance and orientation maps coming out for free, and hypercolumnar processing comes out for free when you use a multilayer version of this approach. With luck this might turn out to be a formal (rather than biological) model of hypercolumnar neural information processing.

As for the serial processing that rides on top of these hypercolumns, it is not obvious to me that this needs additional information processing principles to be invoked. If the encodings used in the hypercolumns are generated by a dynamical process, then it is possible that this same dynamics could give rise to interesting serial processing. This is not guaranteed, but it is certainly possible. We need to understand the various types of emergent behaviour that are implicit in the low-level dynamics: hypercolumnar, serial processing, etc.

Like you, I would love to be able to see very high resolution (in space and time) data from a significantly large number of neurons. But I don't think statistical analysis alone will tell you what information processing is going on; you will need specific models to compare with the data, and such models need to be constructed in a principled way in order to do science rather than butterfly collecting.

 
At 17 February 2007 at 00:17, Anonymous Anonymous said...

Minsky has a distaste for biological models dating back to his misguided work on neural networks (which focused on perceptrons) some 30 years ago. At that time Minsky mistakenly concluded that neural networks were incapable of higher-order functions and, as a result, neural network research was abandoned and not resumed for about 20 years. Minsky was so influential at the time that his word was law, and few were willing to pursue research when Minsky spoke. Unfortunately he was very, very wrong: it turned out that while perceptrons were limited, neural networks in general were not.

Probably no good AI will be done until after Minsky dies, such is his influence. Meanwhile he continues to write everything down (and I do mean _everything_) in the hope that, if any success occurs in the field, Minsky can point to his writings and say "I told you so." (since his writings say essentially everything possible).

The sooner Minsky is seen as a writer of speculation, rather than a serious contributor, the better.

 
At 17 February 2007 at 10:04, Blogger Stephen Luttrell said...

I know about the proof that perceptrons have limited computational capabilities. As far as I know, Minsky did not extend this to neural networks generally, for the simple reason that such an extension would be false. So, I am not aware of Misky being "very, very wrong", as you put it; please correct me if I have overlooked something here.

The abandonment of NN research that happened (we are told) because of the perceptron no-go theorem was unfortunate. It was correct to abandon perceptron research; such simple networks should be left for the signal processing community to work on. However, it was wrong to abandon the search for more general NN approaches. In fact, some people did carry on such work, leading eventually to the resurgence of NN research.

Unfortunately, the decisions of research funding bodies about where to spend their money can be (and are) influenced by reasons that are not based on scientific merit alone. From my own experience, I sometimes wonder whether sciencific judgement enters into it at all! What you get are research band-wagons that start up, run for several years, then dwindle away. It keeps people in work, and gives the appearance of lots of new results being produced, but I think that very few results are genuinely new and useful.

As for the subject matter of this blog posting, I stand by what I have said about parts of neuroscience being cargo cult science, where use of an apparently rigorous methodology gives the misleading impression that the underlying science is also good.

There is an unhealthy fixation on a single approach to building NNs; this is the so-called generative modelling approach. This approach is very flexible, and it has a lot of elegant mathematical machinery to power it, but that doesn't mean that it is right. I view it as one possible formal model of neural-style computation, just as I view my own work as another such formal model.

I find it ironic that users of formal models (which are at best algebraic caricatures) should criticise someone like Minsky because of his distaste for biological models.

I would hope that these formal models could eventually be used to derive the sorts of high-level AI model that Minsky uses, perhaps using an "effective field theory" kind of approach, where the collective behaviour of many low-level degrees of freedom is identified as the basic behaviour of single higher-level degrees of freedom.

 
At 21 March 2007 at 16:16, Blogger Damien said...

"Probably no good AI will be done until after Minsky dies, such is his influence."

Because as everyone knows there is only so much "science" to go around. If only Minsky weren't hogging it all we could finally make some real progress.

 
At 21 March 2007 at 18:11, Blogger Stephen Luttrell said...

One of the roles of "grand old men" of science is to fuse ideas from many areas. The range of ideas can be very broad because of they are the result of a lifetime of accumulated experience, and the fusion is usally a bit loose because there is a lot to fit together correctly.

The upside is that it could be a good way of pointing less experienced researchers in a potentially fruitful direction. The downside is that the ideas might be a misguided dogma that steers the whole field off in a wrong direction. I gather that you think this latter possibility is what is happening with Minsky and AI.

I'm not sure why there is such animosity towards Minsky; maybe there are things going on that I am unaware of. Back in the 1980's, when I had just started in neural network research, I was told that Minsky & Papert's book on Perceptrons killed off neural network research because of the "perceptron no-go theorem" (or whatever you want to call it). But when I checked, I found that the book was correct, and it was actually peoples' reading of the book that was incorrect. It seems that few people actually check the source material, preferring to rely on hear-say.

As for the books that Minsky has produced in recent years, as long as we read them in the spirit in which (I think) they are intended to be read (i.e. musing about various ideas that Minsky has been cooking for decades), then all is OK.

Whichever way you look at it, it is not a good idea to pay too much attention to a single individual.

 
At 22 March 2007 at 10:21, Blogger PWBDecker said...

If you're interested in high level information processing algorithms, you should check out L. Andrew Coward's Recommendation Architecture and his book: Pattern Thinking, which describes this architecture in detail. Also, the work by Jeff Hawkins on his HTM algorithm, which is basically a Neural Network which feeds back as well as forward (during normal operation, not training) so that possible feature data can be reintegrated into lower level sensor processing. Both seem like very promising hypotheses for distributed and autonomous learning and dynamic control systems. I myself am interested in how to adapt such systems to operate temporally, but I have a long way to go before I reach the level of the afformentioned projects. Cheers.

 
At 23 March 2007 at 00:04, Blogger Stephen Luttrell said...

I am interested in high-level information processing algorithms to the extent that they might be emergent properties of a low-level dynamical system of interacting processing units. There are lots of high-level algorithms, but not many that are emergent.

In general, it is hard inverse problem to deduce the low-level dynamics that underlie a high-level algorithm, and in general there is no solution for an arbitrarily chosen high-level algorithm. I guess that this is why not many people take this approach.

 

Post a Comment

<< Home