This was a "spare" editorial from 1995, never published in PC Techniques, appearing here for the first time ever!

In a Manner of Speaking

If I said, "Time flies like an arrow" into the microphone, I would expect the machine to analyze the prior direction of our conversation to discover if we were talking about the nature of time, or the speed of flies. Given the breadth of human context, this could still be fifty years off.

Back in 1973, when my understanding of computing was fuzzy to say the least, I wrote a science fiction story about computing in the 2050's. In creating that imaginary future, I guessed about a lot of things, but the most noteworthy guess was something I still cringe about, 20+ years later: I guessed that voice synthesis would be hard, and real-time voice comprehension would be easy. Sure enough, four years later in 1978 I saw the CompuTalker board running under CP/M. Now here we're closing in on the Millennium, and I still can't dictate these damned editorials into my laptop.

Among the many promises IBM has made for their impending PowerPC systems is real-time dictation. We're supposedly on the list for a review system, and you'll certainly hear my reaction once I've had a chance to try it out.

Two-way voice is one of those nonlinear technological vectors that could completely change the shape of personal computing. Without a keyboard or a screen, a computer becomes a piece of jewelry, clipped to your lapel and always listening. Ask a question, and it answers. Mutter an idle thought, and it remembers. As a guy who gets some of his best ideas while climbing rocks or washing the dishes, this would be a good thing indeed. Later, when you sit in front of your keyboard and screen, the computer on your lapel talks to the "dumb" keyboard and screen through a fast infrared link, like those wireless stereo headphones.

There are some really knotty problems. A few weeks back I was watching a TV comedy retrospective, and saw a 30-year-old Victor Borge bit from The Ed Sullivan Show, in which Borge invents audible equivalents for punctuation, and then reads a paragraph or two from a book, with funny noises to signify punctuation. It was a major hoot, and half the fun was that I could see myself doing that, using whistles and buzzes and clicks for commas and quotes and things. What will probably happen instead (for dictation, at least) is that we'll talk the words and type the punctuation, perhaps on dedicated keypads containing nothing but punctuation keys. Anyone for hand-mouth cordination?

But as tough as it's been to implement, simple dictation is really a snap compared to real-time natural language processing. Getting the machine to convert speech to text accurately is absolutely nothing like getting the machine to understand what you're trying to say.

It's more than parsing sentences. It's about parsing context, whatever that means. In other words, if I said, "Time flies like an arrow" into the microphone, I would expect the machine to analyze the prior direction of our conversation to discover if we were talking about the nature of time, or the speed of flies. Given the breadth of human context, this could be another fifty years off.

And having philosophical conversations with computers seems a kind of silly SF notion, unless you're a bleeding-edge researcher interested in what sort of "philosophy" a machine would evolve if prodded. Me, I'd be more than happy with a machine that could create useful database queries on voice command; one that would analyze my statement of needs and then suggest refinements would be Valhalla.

And this leads to my final question, one for which I have no good answer: Should we shoot for the general solution to this problem, no matter how long it takes? Or should we start by expanding SQL to embrace a kind of "structured English" and simply add "canned contexts" as we go? The risk in the first approach is that it may be impossible; the risk in the second is that if successful it might short-circuit any truly general solution based on some very large self-referential neural network or who knows what else.

In other words, would we be cheating ourselves out of discovering what intelligence actually is, by being content with teaching machines how to discern what we want them to do?

Things like this keep SF writers awake at night. Just thought you might like to know.