From Visual Developer Magazine. Written in March, 1999, this was a "spare" editorial that was never published.
The Engine That Surpasseth All Understanding
People quote Codd's rules as though they were laws of physics. Do that long enough, and they becomefor you at leastlaws of physics.
I've begun work on a software project with long horizons: The rest of my life. And while I'm no spring chicken, that's still a good thirty or maybe even forty years. Of the project (to which I've attached the code name Aardmarks) I'll say more in upcoming articles. It's an outgrowth of my "Virtual Encyclopedia" idea that readers of this magazine have seen in various forms over the years. It involves a body of data that starts out big and becomes immenseand managing immensity is more or less what the software is being designed to do.
It's not like I'll ever be lacking a place to put it. The information is almost entirely textual, and you can get a 19.2GB hard drive for $379 now. My guess is that data storage will grow faster than the Aardmarks database as long as I live. (Bandwidth is the greater issue…but more on that another time.) My biggest worry is that we lack software tools to manage really immense databases. This hasn't been a problem before…but you can put half a million database records on your laptop today, and a hundred million in a few years. We'll have the disk, and we'll have the cycles; gigahertz Pentiums are in the lab. Will we have the algorithms?
There was a day when machines were so slow that relational databases were impractical. Instead, we used the network model, in which databases were persistent linked lists. The tradeoff was flexibilityrestructuring data was a nightmarebut the systems were responsive. Today, relational databases are a religion, and I doubt anybody even remembers how to create network-model databases. I'm wondering, however, whether the relational model will do the job when the Aardmarks database has thirty million records. And if not relational, then…what?
This isn't a plea for a return to an antiquated, inflexible data management model. It's a plea for new ideas. Years and years ago, Michael Abrash ran a code optimization challenge in this magazine's predecessor, PC TECHNIQUES. The challenge was the Game of Life. The winner, David Stafford, now of Eclipse Entertainment, wrote an astonishing Game of Life that stretched the rules considerably: It was a program that generated a program that played an extraordinarily optimized Game of Life. (Read about David's solution in the February/March 1992 PC TECHNIQUES.)
David Stafford, not coincidentally, is in the games business, which is one of the few areas in software development where people systematically think outside the box. I've seen precious little of that in data management in recent years. People quote Codd's rules as though they were laws of physics. Do that long enough, and they becomefor you at leastlaws of physics.
Neither Microsoft Access nor Borland's BDE can crunch databases with more than ten or twenty thousand records. I already have 17,000 messages in my email folderswhich is about a third of what I've received since the end of 1994. I'd like to put it all in a database, but I have no idea what database has the muscle to handle it all. And what will I do in 2010 when my email archives are at 75,000 messages?
My intuition is that we've gotten very stale on the database technology side. Massive disk systems, multithreading, and fast processors have not been the rule on the desktop until very recently, and with our 1990 blinders still on we may not be able to see the novel approaches that those computational riches make possible. It's time for somebody to devise a new approach to management of big databaseslike those we will be managing soon. (Or now.)
Who? If I had to guess, I'd say a gamer. The game guys understand code performance like nobody else, and they approach every task as though it were a challenge standing alone. So that's my challenge: Gen up a test database with a million records, and build the Engine that Surpasseth All Understanding.
Yeah, after you've written Quake, databases are boring. But Larry Ellison's a billionaire. It's time that the database industry had another.