Okay, it's been a while since my first post. But I'm back now, and I've got a lot to cover.
In my last post, I took aim at one of the AI world's most frequent critics, John Searle. Now I'm feeling a little bit bad about it, upon reflection - I mean, the poor guy is trying to argue against the feasibility of a highly technical engineering project using nothing but pure logic. That's a trickier goal to accomplish than the one that AI researchers have to tackle!
Today I was originally going to take some potshots at Jaron Lanier, another member of the anti-AI camp (although that's a completely one dimensional view of his stance, which seems much more reasonable to me than Searle's - Lanier explains his views more clearly here; note that he is also only peripherally interested in this issue, as opposed to Searle, whose entire career is almost entirely based upon referring to his dualist beliefs by another name). But you know what? Since I don't for the moment care about the philosophical implications (the "can a computer ever be conscious?" questions), I shouldn't even waste time arguing about the people that do. This is an engineering task at heart, so I will look at it as such.
Of course, it's an incredible bitch of a problem. There's a term floating around, "AI-hard," that is sometimes used to describe problems that are equivalent to solving the entire AI problem in one blow. The idea: once you've done one, you've done them all, and you've got something that can be as intelligent as a human. Needless to say, practically by definition passing the Turing test is AI-hard. So is "perfect" text compression (though I would argue that this is several steps ahead of human level AI), and comprehension of visual images. An argument can be made that even the simple transcription of the spoken word is AI hard if you want near perfect (i.e. useful) results. All of these problems are so multifaceted and difficult that we literally don't know where to begin.
So the first step was just to start thinking about these things at all. "Classical" AI research is, generally speaking, either extremely domain specific or extremely weak. It comprises things like expert systems, Bayesian nets, and a lot of stuff that falls under the general heading of machine learning. But it's all pretty straightforward - apart from a few twists, it all boils down to hand coding (or occasionally automating the coding of, but not the collection and interpretation of) primitive facts that are stored as meaningless symbols, and then storing meta-facts for manipulating the facts, and meta-meta-facts for manipulating those, etc. No matter how far you carry this, you're basically doing old-school AI research, since you can't just "let it loose" and watch it devour information.
The old style had some successes, but it's pretty well understood that it's not a feasible route to "real" AI. What we want is something that we can just turn on and gawk at as it becomes smart enough to chat with us. So we make the move to "new" AI (which by now has aged considerably) - neural nets, fuzzy logic, and evolutionary methods. The idea here is that if we design algorithms that by their nature can recognize patterns and deal in more "vague" ways with data, then we're probably getting closer to how the brain does it. And if we don't have to program them by hand, even better!
And to some extent all this is true. After all, neural networks are literally modeled on the structure of the brain at the neuron level. But there's just one problem: this still doesn't tell us where to start. Neural nets became fabulously popular because of the backpropagation algorithm, which was basically an efficient way to train a time-independent feed-forward network to reproduce a specified function and make decent interpolations and generalizations. This actually did solve a lot of problems in a nice way, but there's no way to apply it to general intelligence because you need to hand feed the net the desired output (among other reasons that such a net could never be truly intelligent). There's no way to feed it the command "Be creative and smart!" and set the algorithm running, so we're back to square one.
People got really excited over recurrent networks, too - these allow loops in them, and carry out their actions "in time" as opposed to in one pass. In theory, they are very powerful, and can have memory, and all sorts of great things. The problem? Nobody knows how to train the damn things! Yes, yes, you can get decent results in very artificial situations by using various tricks, but generally, these things aren't even used very effectively for mundane pattern recognition tasks because they're so difficult to work with.
To clear up a misconception that many people (especially pointy-haired types) have: there is nothing magic about a non-recurrent neural net. It is just another way of representing a function as a sequence of numbers, much as if you just plotted points or added sine waves in the right proportions. They can't pick out patterns that aren't there, they merely interpolate between data points in a fairly nice way. "Fairly nice" tends to mean that they are good at ignoring noisy data - for example, if you used a polynomial fit to a bunch of noisy data, it would swing all over the place because it tries to fit all the points. Neural nets can (when properly applied) smooth over some of these things, which is certainly useful. But it's far from magic!
When we look at time dependent nets, we find a bit more magic, mainly because of the difference between a class and a function. A simple (i.e. no side effect) function merely takes an input and spits back an output, whereas a class has state. Time dependent nets can form loops that may act as bits of memory, and can also be coaxed into forming NAND gates. This, of course, means that just about anything you can do with a computer you can do with a recurrent net of the right architecture. Not that it would be pretty!
Which is, now that I've gotten all this way, the essential point of this post. We do not yet have the tools to write pretty enough code to tackle strong AI. When we finally do, the problem will not be much of a problem anymore, and I suspect it will be largely because we formulate our thinking about code in an entirely different way (i.e. a different programming language or paradigm). Yes, we have Turing complete languages, so we technically have the ability to do it, but it's too hard to actually carry out. 50 or so years of AI research have gained us an understanding that all the hacks and half measures that are available to us are not enough to do this thing. From the "new" AI, we see that it's not enough just to ape the way the brain is constructed physically, because we're only able to ape it! We can't actually copy it, and without being able to do that, we don't have any idea how to put the connections in the right place, or set thresholds correctly, or figure out how exactly to strengthen connections, etc. And from the classical side of research, we've learned that we can't just imagine how the brain is organized, because all we are doing is guessing based on introspection.
Our best path is to realize that we still have no foothold here, and it's because programming is too hard at the moment. We can't manipulate big enough ideas with the ease that we will need. We need to simply build up a better bag of tricks that we can come at this problem with. Note that I never covered evolutionary methods. Well, that's because despite their magnificent promise, they kind of suck right now. Nobody is doing anything very useful with them on a large scale. Why is this? Simple - evolution in the real world works only because of two things: a huge population to work on and the genetic code. The scale we cannot hope to match. But the code is syntax-free, flexible, and powerful enough to build just about anything imaginable, and this is something we can try to mimic. Things break and if they break in a useful way, the results are incorporated into the being. Data is stored right along with function. Function can be coopted for new purposes and reused as much as is needed. These things are all very important, and worth looking into. In computers, we have no language that can do all of this stuff without major hackfests (yes, I'm including Lisp in this condemnation, even if it is better than the rest). The only language that's even close to as flexible as the genetic code is probably machine code, but machine code is too primitive to construct the kinds of algorithms we need without the scale that the real world provides for DNA to play in, and frankly, it's too hard to work with to keep an eye on and train.
It is my feeling that the search for AI is not really a search for intelligence at all. Intelligence will merely be a by-product of coming up with the prettiest language ever to program in - you might even decide that your compiler has a deeper, more robust intelligence than you do, years before a computer actually passes the Turing test, if we do this job right. This will take quite some time, but the rewards will be well worth the effort. But we've got to start thinking about it now, because the pace of language change is glacial, even in a field as fast moving as ours is.
Monday, January 29, 2007
Subscribe to:
Posts (Atom)