"Understanding" is a word that is often used but hard to pin down in a technical sense. Dictionary.com offers no less than 13 definitions, the first of which reads "to perceive the meaning of; grasp the idea of; comprehend." Hmm. Interesting. Let's go to "comprehend" and see what comes up: "to understand the nature or meaning of; grasp with the mind; perceive." I see, so to understand is to comprehend, and to comprehend is to understand the nature or meaning of. You see the problem. If we instead choose to follow up on "perceive," it leads us down some more paths, none of which give us anything approaching a technical definition of understanding, one that would let us look at something that doesn't already understand what it means to understand and decide that it understands something else (if you'll forgive the bloated nature of that sentence).
Understanding is something we just do, that we know when we experience it. But this isn't English class, where everything is right, nor is it philosophy class, where everything is up for debate! There is an essence to the colloquial use of "understand" that should be possible to capture in a somewhat rigorous way - by which I merely mean by defining it in terms of words of less complexity than itself.
Which brings me to a big point. We tend to think of an explanation of some fact or observation as a string of words that somehow reduce that fact or observation to a (perhaps longer) string of words that is in some way simpler, or more primitive, where each piece takes less explanation than the original fact. If I try to explain to you what a real number is and I say that it's any number on a continuous line, including numbers that can't written by fractions, that makes sense. But if I look at you like an asshole for asking and tell you that it's a Dedekind cut on the rational numbers, things just get more confusing for you. Human comprehension almost always progresses towards the atomic nuggets of understanding that we all seem to magically possess and feel comfortable manipulating rather than towards bigger ideas that are used to explain the smaller ones.
This is a point not taken lightly in science. When two different theories make the same predictions about the results of an experiment, whichever is deemed to be simpler is preferred. Now, this is not always an easy chore, deciding what "simple" means in the context of a complicated scientific theory; in fact, it's probably no easier than defining "understand." But in both cases, we know it when we see it, and if it looks simple, it's easier to understand. So the two are intimately connected. Many theories have been, in fact, overthrown on the grounds of relative simplicity, perhaps the most public case being the epicycle theories of pre-Newtonian astronomy, which for all intents and purposes made up Fourier series approximations of true elliptical orbits, but with a whole lot more mathematical baggage.
There are quite valid mathematical reasons to prefer simple models to complex ones. For an example, suppose we have 20 points on a graph that lie more or less in a straight line (but with a bit of noise). A 20th degree polynomial can fit these points perfectly. But it will rocket off to who knows where in between them. On the other hand, a first degree polynomial, which is much simpler by any definition, can "mostly" fit the data, and appears to capture the true nature of the distribution even if it does not give an exact fit. To extrapolate this observation to all of science is a leap of faith; however, it is a leap of faith that has proven to be quite useful over the centuries.
So where am I going with this? Simple: we need to be careful about defining the first sub-problems of strong AI. I think our failures run much deeper than just technical difficulties, we're literally studying the wrong problems at the moment, overfitting our methods to our goals. We're not going to achieve a Turing test machine instantly, as such a thing will necessarily be a conglomeration of many different breakthroughs in many different areas. Instead of focusing on pattern classifiers and stuff like that we really need to figure out what the tough core of this problem is made up of, and put it as simply as we possibly can. If we can define the problem in a clear enough manner, we may find that it's not that tough, after all. After that, we can worry about preprocessing, which is what most current "AI" research really amounts to (the fact that something takes place in the brain does not make that something a vital piece of intelligence! As fun and important to us as pattern recognition is, it's not even close to the essence of what qualifies us as sentient).
I know that the fundamental problem has something to do with understanding the process of understanding - what is going on information-wise when I "grok" a topic? But I have no idea how to understand the process, as it is not available to introspection. All I know is that when something is understood, it becomes simple, whether that's a function of its storage method, contents, or interpretation. So in my opinion, we need to take a real hard (and perhaps not so simple!) look at simplicity, and what it means to reduce a set of observations to maximum reasonable simplicity to see where to go next. This may be a simpler goal, to start with.
Tuesday, February 13, 2007
Monday, January 29, 2007
In Search Of The Prettiest Language
Okay, it's been a while since my first post. But I'm back now, and I've got a lot to cover.
In my last post, I took aim at one of the AI world's most frequent critics, John Searle. Now I'm feeling a little bit bad about it, upon reflection - I mean, the poor guy is trying to argue against the feasibility of a highly technical engineering project using nothing but pure logic. That's a trickier goal to accomplish than the one that AI researchers have to tackle!
Today I was originally going to take some potshots at Jaron Lanier, another member of the anti-AI camp (although that's a completely one dimensional view of his stance, which seems much more reasonable to me than Searle's - Lanier explains his views more clearly here; note that he is also only peripherally interested in this issue, as opposed to Searle, whose entire career is almost entirely based upon referring to his dualist beliefs by another name). But you know what? Since I don't for the moment care about the philosophical implications (the "can a computer ever be conscious?" questions), I shouldn't even waste time arguing about the people that do. This is an engineering task at heart, so I will look at it as such.
Of course, it's an incredible bitch of a problem. There's a term floating around, "AI-hard," that is sometimes used to describe problems that are equivalent to solving the entire AI problem in one blow. The idea: once you've done one, you've done them all, and you've got something that can be as intelligent as a human. Needless to say, practically by definition passing the Turing test is AI-hard. So is "perfect" text compression (though I would argue that this is several steps ahead of human level AI), and comprehension of visual images. An argument can be made that even the simple transcription of the spoken word is AI hard if you want near perfect (i.e. useful) results. All of these problems are so multifaceted and difficult that we literally don't know where to begin.
So the first step was just to start thinking about these things at all. "Classical" AI research is, generally speaking, either extremely domain specific or extremely weak. It comprises things like expert systems, Bayesian nets, and a lot of stuff that falls under the general heading of machine learning. But it's all pretty straightforward - apart from a few twists, it all boils down to hand coding (or occasionally automating the coding of, but not the collection and interpretation of) primitive facts that are stored as meaningless symbols, and then storing meta-facts for manipulating the facts, and meta-meta-facts for manipulating those, etc. No matter how far you carry this, you're basically doing old-school AI research, since you can't just "let it loose" and watch it devour information.
The old style had some successes, but it's pretty well understood that it's not a feasible route to "real" AI. What we want is something that we can just turn on and gawk at as it becomes smart enough to chat with us. So we make the move to "new" AI (which by now has aged considerably) - neural nets, fuzzy logic, and evolutionary methods. The idea here is that if we design algorithms that by their nature can recognize patterns and deal in more "vague" ways with data, then we're probably getting closer to how the brain does it. And if we don't have to program them by hand, even better!
And to some extent all this is true. After all, neural networks are literally modeled on the structure of the brain at the neuron level. But there's just one problem: this still doesn't tell us where to start. Neural nets became fabulously popular because of the backpropagation algorithm, which was basically an efficient way to train a time-independent feed-forward network to reproduce a specified function and make decent interpolations and generalizations. This actually did solve a lot of problems in a nice way, but there's no way to apply it to general intelligence because you need to hand feed the net the desired output (among other reasons that such a net could never be truly intelligent). There's no way to feed it the command "Be creative and smart!" and set the algorithm running, so we're back to square one.
People got really excited over recurrent networks, too - these allow loops in them, and carry out their actions "in time" as opposed to in one pass. In theory, they are very powerful, and can have memory, and all sorts of great things. The problem? Nobody knows how to train the damn things! Yes, yes, you can get decent results in very artificial situations by using various tricks, but generally, these things aren't even used very effectively for mundane pattern recognition tasks because they're so difficult to work with.
To clear up a misconception that many people (especially pointy-haired types) have: there is nothing magic about a non-recurrent neural net. It is just another way of representing a function as a sequence of numbers, much as if you just plotted points or added sine waves in the right proportions. They can't pick out patterns that aren't there, they merely interpolate between data points in a fairly nice way. "Fairly nice" tends to mean that they are good at ignoring noisy data - for example, if you used a polynomial fit to a bunch of noisy data, it would swing all over the place because it tries to fit all the points. Neural nets can (when properly applied) smooth over some of these things, which is certainly useful. But it's far from magic!
When we look at time dependent nets, we find a bit more magic, mainly because of the difference between a class and a function. A simple (i.e. no side effect) function merely takes an input and spits back an output, whereas a class has state. Time dependent nets can form loops that may act as bits of memory, and can also be coaxed into forming NAND gates. This, of course, means that just about anything you can do with a computer you can do with a recurrent net of the right architecture. Not that it would be pretty!
Which is, now that I've gotten all this way, the essential point of this post. We do not yet have the tools to write pretty enough code to tackle strong AI. When we finally do, the problem will not be much of a problem anymore, and I suspect it will be largely because we formulate our thinking about code in an entirely different way (i.e. a different programming language or paradigm). Yes, we have Turing complete languages, so we technically have the ability to do it, but it's too hard to actually carry out. 50 or so years of AI research have gained us an understanding that all the hacks and half measures that are available to us are not enough to do this thing. From the "new" AI, we see that it's not enough just to ape the way the brain is constructed physically, because we're only able to ape it! We can't actually copy it, and without being able to do that, we don't have any idea how to put the connections in the right place, or set thresholds correctly, or figure out how exactly to strengthen connections, etc. And from the classical side of research, we've learned that we can't just imagine how the brain is organized, because all we are doing is guessing based on introspection.
Our best path is to realize that we still have no foothold here, and it's because programming is too hard at the moment. We can't manipulate big enough ideas with the ease that we will need. We need to simply build up a better bag of tricks that we can come at this problem with. Note that I never covered evolutionary methods. Well, that's because despite their magnificent promise, they kind of suck right now. Nobody is doing anything very useful with them on a large scale. Why is this? Simple - evolution in the real world works only because of two things: a huge population to work on and the genetic code. The scale we cannot hope to match. But the code is syntax-free, flexible, and powerful enough to build just about anything imaginable, and this is something we can try to mimic. Things break and if they break in a useful way, the results are incorporated into the being. Data is stored right along with function. Function can be coopted for new purposes and reused as much as is needed. These things are all very important, and worth looking into. In computers, we have no language that can do all of this stuff without major hackfests (yes, I'm including Lisp in this condemnation, even if it is better than the rest). The only language that's even close to as flexible as the genetic code is probably machine code, but machine code is too primitive to construct the kinds of algorithms we need without the scale that the real world provides for DNA to play in, and frankly, it's too hard to work with to keep an eye on and train.
It is my feeling that the search for AI is not really a search for intelligence at all. Intelligence will merely be a by-product of coming up with the prettiest language ever to program in - you might even decide that your compiler has a deeper, more robust intelligence than you do, years before a computer actually passes the Turing test, if we do this job right. This will take quite some time, but the rewards will be well worth the effort. But we've got to start thinking about it now, because the pace of language change is glacial, even in a field as fast moving as ours is.
In my last post, I took aim at one of the AI world's most frequent critics, John Searle. Now I'm feeling a little bit bad about it, upon reflection - I mean, the poor guy is trying to argue against the feasibility of a highly technical engineering project using nothing but pure logic. That's a trickier goal to accomplish than the one that AI researchers have to tackle!
Today I was originally going to take some potshots at Jaron Lanier, another member of the anti-AI camp (although that's a completely one dimensional view of his stance, which seems much more reasonable to me than Searle's - Lanier explains his views more clearly here; note that he is also only peripherally interested in this issue, as opposed to Searle, whose entire career is almost entirely based upon referring to his dualist beliefs by another name). But you know what? Since I don't for the moment care about the philosophical implications (the "can a computer ever be conscious?" questions), I shouldn't even waste time arguing about the people that do. This is an engineering task at heart, so I will look at it as such.
Of course, it's an incredible bitch of a problem. There's a term floating around, "AI-hard," that is sometimes used to describe problems that are equivalent to solving the entire AI problem in one blow. The idea: once you've done one, you've done them all, and you've got something that can be as intelligent as a human. Needless to say, practically by definition passing the Turing test is AI-hard. So is "perfect" text compression (though I would argue that this is several steps ahead of human level AI), and comprehension of visual images. An argument can be made that even the simple transcription of the spoken word is AI hard if you want near perfect (i.e. useful) results. All of these problems are so multifaceted and difficult that we literally don't know where to begin.
So the first step was just to start thinking about these things at all. "Classical" AI research is, generally speaking, either extremely domain specific or extremely weak. It comprises things like expert systems, Bayesian nets, and a lot of stuff that falls under the general heading of machine learning. But it's all pretty straightforward - apart from a few twists, it all boils down to hand coding (or occasionally automating the coding of, but not the collection and interpretation of) primitive facts that are stored as meaningless symbols, and then storing meta-facts for manipulating the facts, and meta-meta-facts for manipulating those, etc. No matter how far you carry this, you're basically doing old-school AI research, since you can't just "let it loose" and watch it devour information.
The old style had some successes, but it's pretty well understood that it's not a feasible route to "real" AI. What we want is something that we can just turn on and gawk at as it becomes smart enough to chat with us. So we make the move to "new" AI (which by now has aged considerably) - neural nets, fuzzy logic, and evolutionary methods. The idea here is that if we design algorithms that by their nature can recognize patterns and deal in more "vague" ways with data, then we're probably getting closer to how the brain does it. And if we don't have to program them by hand, even better!
And to some extent all this is true. After all, neural networks are literally modeled on the structure of the brain at the neuron level. But there's just one problem: this still doesn't tell us where to start. Neural nets became fabulously popular because of the backpropagation algorithm, which was basically an efficient way to train a time-independent feed-forward network to reproduce a specified function and make decent interpolations and generalizations. This actually did solve a lot of problems in a nice way, but there's no way to apply it to general intelligence because you need to hand feed the net the desired output (among other reasons that such a net could never be truly intelligent). There's no way to feed it the command "Be creative and smart!" and set the algorithm running, so we're back to square one.
People got really excited over recurrent networks, too - these allow loops in them, and carry out their actions "in time" as opposed to in one pass. In theory, they are very powerful, and can have memory, and all sorts of great things. The problem? Nobody knows how to train the damn things! Yes, yes, you can get decent results in very artificial situations by using various tricks, but generally, these things aren't even used very effectively for mundane pattern recognition tasks because they're so difficult to work with.
To clear up a misconception that many people (especially pointy-haired types) have: there is nothing magic about a non-recurrent neural net. It is just another way of representing a function as a sequence of numbers, much as if you just plotted points or added sine waves in the right proportions. They can't pick out patterns that aren't there, they merely interpolate between data points in a fairly nice way. "Fairly nice" tends to mean that they are good at ignoring noisy data - for example, if you used a polynomial fit to a bunch of noisy data, it would swing all over the place because it tries to fit all the points. Neural nets can (when properly applied) smooth over some of these things, which is certainly useful. But it's far from magic!
When we look at time dependent nets, we find a bit more magic, mainly because of the difference between a class and a function. A simple (i.e. no side effect) function merely takes an input and spits back an output, whereas a class has state. Time dependent nets can form loops that may act as bits of memory, and can also be coaxed into forming NAND gates. This, of course, means that just about anything you can do with a computer you can do with a recurrent net of the right architecture. Not that it would be pretty!
Which is, now that I've gotten all this way, the essential point of this post. We do not yet have the tools to write pretty enough code to tackle strong AI. When we finally do, the problem will not be much of a problem anymore, and I suspect it will be largely because we formulate our thinking about code in an entirely different way (i.e. a different programming language or paradigm). Yes, we have Turing complete languages, so we technically have the ability to do it, but it's too hard to actually carry out. 50 or so years of AI research have gained us an understanding that all the hacks and half measures that are available to us are not enough to do this thing. From the "new" AI, we see that it's not enough just to ape the way the brain is constructed physically, because we're only able to ape it! We can't actually copy it, and without being able to do that, we don't have any idea how to put the connections in the right place, or set thresholds correctly, or figure out how exactly to strengthen connections, etc. And from the classical side of research, we've learned that we can't just imagine how the brain is organized, because all we are doing is guessing based on introspection.
Our best path is to realize that we still have no foothold here, and it's because programming is too hard at the moment. We can't manipulate big enough ideas with the ease that we will need. We need to simply build up a better bag of tricks that we can come at this problem with. Note that I never covered evolutionary methods. Well, that's because despite their magnificent promise, they kind of suck right now. Nobody is doing anything very useful with them on a large scale. Why is this? Simple - evolution in the real world works only because of two things: a huge population to work on and the genetic code. The scale we cannot hope to match. But the code is syntax-free, flexible, and powerful enough to build just about anything imaginable, and this is something we can try to mimic. Things break and if they break in a useful way, the results are incorporated into the being. Data is stored right along with function. Function can be coopted for new purposes and reused as much as is needed. These things are all very important, and worth looking into. In computers, we have no language that can do all of this stuff without major hackfests (yes, I'm including Lisp in this condemnation, even if it is better than the rest). The only language that's even close to as flexible as the genetic code is probably machine code, but machine code is too primitive to construct the kinds of algorithms we need without the scale that the real world provides for DNA to play in, and frankly, it's too hard to work with to keep an eye on and train.
It is my feeling that the search for AI is not really a search for intelligence at all. Intelligence will merely be a by-product of coming up with the prettiest language ever to program in - you might even decide that your compiler has a deeper, more robust intelligence than you do, years before a computer actually passes the Turing test, if we do this job right. This will take quite some time, but the rewards will be well worth the effort. But we've got to start thinking about it now, because the pace of language change is glacial, even in a field as fast moving as ours is.
Subscribe to:
Posts (Atom)