The road to LINQ, Part 1

Sometimes a program doesn't know what it needs until the user asks for it. When the primary purpose for a program's existence is to find things when the user asks for them then the program can be, generally, classified as a query tool.

Query tools are found all over the place. Practically every business in every industry has its own way of generating, describing, storing and querying data.

In years gone by the data might have been generated when someone filled out a form. The description of the data would be the form's labels and fields. Form storage was provided by file cabinets. Back in those days the query tool was the receptionist, file clerk or whoever happened to know how to track down the desired bit of information.

These days computer programs are the query tools. One problem common to pretty much every query tool is how to conduct its search in a way that will flexibly accommodate a wide variety of user requests. Put another way; how do I give you a program that lets you find what you're looking for without me having to modify the program every time you're looking for something else?

Fortunately an elegant solution to this problem was invented long ago. It's called SQL.

Unfortunately, SQL requires data to be stored in a very specific form. That form, the relational model, though powerful, takes some getting used to. So although there has existed an elegant solution to the problem of flexibly querying data even to this day much of that data is not in a form that can benefit from SQL.

Why is SQL elegant? In essence, it is elegant because it succinctly represents a higher level abstraction that can be applied to almost any kind of data. SQL operates at a level of abstraction closer to the way human beings formulate questions.

With SQL the user specifies what it is they're looking for. The significance of this may not be obvious to non-computer programmers but for programmers it's a radical notion. A program is a sequence of instructions that specify how to do something. Computer programmers have for decades been in the business of telling computers how to do some set of tasks.

For example, a college professor may want to know the names of all students that passed his most recent exam. A traditional program written to answer this question focuses almost entirely on the how. Psuedocode for that program might look like the following:

open grades file
loop over each grade entry
if the grade entry is above 70 copy the entry to the passed list
end loop
print every entry in the passed list

While that's perfectly comprehensible to pretty much any programmer it's pretty far removed from the way the professor himself might express his desire. In SQL this might be:

select firstname, lastname from grades where grade > 70

This is clearly much closer to what the professor had in mind. He wants the names of every student that passed his exam. A passing grade on the exam is one that is higher than 70.

The how is entirely missing from the SQL version. The cost of this convenience is that the grades have to be stored in relational form because SQL can only query data in relational form.

As the information age steamrolls on the fact that most of the world's data is not in relational form becomes a growing problem. From the example above it's obvious that writing the how is more difficult than writing the what. More difficult means more errors will occur.

Another problem inhibiting use of SQL to tame the onslaught of data that characterizes the information age is that it represents a paradigm shift for generations of computer programmers. If you got a degree in CS before 1995 there's a good chance you've never even heard of SQL.

No comments:

Post a Comment