The road to LINQ, Part 1

Sometimes a program doesn't know what it needs until the user asks for it. When the primary purpose for a program's existence is to find things when the user asks for them then the program can be, generally, classified as a query tool.

Query tools are found all over the place. Practically every business in every industry has its own way of generating, describing, storing and querying data.

In years gone by the data might have been generated when someone filled out a form. The description of the data would be the form's labels and fields. Form storage was provided by file cabinets. Back in those days the query tool was the receptionist, file clerk or whoever happened to know how to track down the desired bit of information.

These days computer programs are the query tools. One problem common to pretty much every query tool is how to conduct its search in a way that will flexibly accommodate a wide variety of user requests. Put another way; how do I give you a program that lets you find what you're looking for without me having to modify the program every time you're looking for something else?

Fortunately an elegant solution to this problem was invented long ago. It's called SQL.

Unfortunately, SQL requires data to be stored in a very specific form. That form, the relational model, though powerful, takes some getting used to. So although there has existed an elegant solution to the problem of flexibly querying data even to this day much of that data is not in a form that can benefit from SQL.

Why is SQL elegant? In essence, it is elegant because it succinctly represents a higher level abstraction that can be applied to almost any kind of data. SQL operates at a level of abstraction closer to the way human beings formulate questions.

With SQL the user specifies what it is they're looking for. The significance of this may not be obvious to non-computer programmers but for programmers it's a radical notion. A program is a sequence of instructions that specify how to do something. Computer programmers have for decades been in the business of telling computers how to do some set of tasks.

For example, a college professor may want to know the names of all students that passed his most recent exam. A traditional program written to answer this question focuses almost entirely on the how. Psuedocode for that program might look like the following:

open grades file
loop over each grade entry
if the grade entry is above 70 copy the entry to the passed list
end loop
print every entry in the passed list

While that's perfectly comprehensible to pretty much any programmer it's pretty far removed from the way the professor himself might express his desire. In SQL this might be:

select firstname, lastname from grades where grade > 70

This is clearly much closer to what the professor had in mind. He wants the names of every student that passed his exam. A passing grade on the exam is one that is higher than 70.

The how is entirely missing from the SQL version. The cost of this convenience is that the grades have to be stored in relational form because SQL can only query data in relational form.

As the information age steamrolls on the fact that most of the world's data is not in relational form becomes a growing problem. From the example above it's obvious that writing the how is more difficult than writing the what. More difficult means more errors will occur.

Another problem inhibiting use of SQL to tame the onslaught of data that characterizes the information age is that it represents a paradigm shift for generations of computer programmers. If you got a degree in CS before 1995 there's a good chance you've never even heard of SQL.

Remotely debugging managed code

So, with hopes of debugging managed code, you're running the Visual Studio Remote Debugger Monitor (msvsmon.exe) on the remote machine. Unfortunately, it will not let you connect to it from Visual Studio running on your local machine. You've tried the following steps to no avail:
  • You're running it as an administrator (or member of the administrators group) on an XP-SP* machine.
  • Your firewall has an exception for msvsmon.exe.
  • msvsmon.exe is using the default authentication setting (Windows Authentication).
In other words, you've read the documentation and followed the steps contained therein.

Despite these valiant efforts every time you try to connect to the remote debugger monitor from Visual Studio (ctrl+alt+p, transport=default, host set to the remote machine's WINS name) a wonderfully unhelpful error dialog pops up informing you that it can't find the remote debugger monitor.

A quick way to work around this is to:
  1. Create an account on the remote machine with the same username and password as the account on your local machine that is running Visual Studio.
  2. Put the account into the administrator's group on the remote machine.
  3. Run msvsmon.exe (it can be on a remote share) using the account you just created. This can be done by right clicking msvsmon.exe, choosing "Run As" then changing the logon and password.
As far as I can tell, the problem is that msvsmon.exe will only let you connect to it (with the transport that supports debugging managed code) using the credentials of the user under which it is running. Although it allows you to debug other user's processes once you've connected, it will not let you connect unless you're running it as you.

Yet another reason to love Google Reader

Over the past year or so more and more of the reading I do on the web is being done through Google Reader.

This has been entirely unintentional. It just happens to be very handy to have a single destination that gets updated frequently throughout the day.

Part of this accidental dependence is the result of good design. Google Reader devotes most of its space to the intended content (news, blogs, headlines, etc...).

It makes liberal use of AJAX to make the UI incredibly responsive.

It even has keyboard shortcuts though those haven't grown on me yet.

They've also got several nice bundles of content that make it easy to find interesting articles.

It even plays nice with mobile phones that are not the iPhone. It is hard to overstate how amazing it is that Google Reader works with Pocket Internet Explorer 5! Nothing works with Pocket IE5 except sites that assiduously restrict themselves to whatever meager fraction of WAP-friendly HTML is grokked by Pocket IE5.

Not only does it play nice with Pocket IE5 it even makes other sites play nice with Pocket IE5 by stripping away all of the HTML goodness that PIE5 can't deal with.

I'm not sure I could go back to standing in line without Google Reader on Windows Mobile (or something like it).

Today I discovered that they didn't forget their "raison d'etre". The convenient search bar at the top of the Reader homepage searches, among other things, any of the articles you've ever read in Google Reader! So it's a great way to track down that handy link about something or other you read a while ago but forgot about.

Achtung Baby! (Warnings are your friend)

I'm busy with a release this week so the posts will be slim.

As a release nears the time available to get anything done decreases and the temptation to ignore "little things" increases. Compiler warnings tend to get ranked lower on the "need to fix" scale when deadlines loom large.

I experienced a cautionary tale in why compiler warnings are important, even when time is short, during a recent build. While developing a class I realized that it needed a few more properties. So I added the properties to the class, added parameters for the properties to the constructor and completely forgot to set the properties in the constructor.

C# dutifully warned me that "parameterX is not being used". Unfortunately this streamed by in the output window and was lost amidst the flurry of other warnings that have crept into the source code over the past few years. This resulted in several minutes wasted scratching my head over why the property was never changed from its default value even though it was clearly being passed a new value in the constructor.

Gotta take advantage of the freebies that life gives you. Compiler warnings are freebies - it's foolish to ignore them.

When you need to write lots of code in a short amount of time

I needed to port a fairly sizable VB6 app to WinForms/C# in a short amount of time (~2months). One of the tricks to writing this much code quickly was to:
  1. Port 3 - 5 menu elements that were representative a most of the menu elements.
  2. By the time you get to the 3rd menu element some common patterns emerge.
  3. Put those common patterns into a code snippet and reuse the code snippet for the remaining menu elements.
  4. Repeat steps 1 - 3 for each "class" of menu element (where a class includes all menu elements that execute similarly).

Exception handling in event handlers was low hanging fruit. Sometimes an Exception occurs at a location where I have information that can make the error message more useful. e.g., An IO Exception that occurs in 1 part of a multistep process makes more sense if the multistep process is included in the error message.

So I made a code snippet with a try/catch (ApplicationException)/catch (Exception). If the Exception can be made more useful then it gets wrapped in ApplicationException-derived Exception, otherwise just let it bubble up.

This code snippet would be the first thing I'd include in any new menu event handler.

Another trick was taking advantage of the DynamicInvoke() method of delegates. Most of the menu event handlers opened a dialog window, retrieved input from the user then passed that input to a method call in another library. The library would create a new output file as a result. There were 40 - 60 of these kinds of dialog boxes.

Instead of separately doing this input setup/marshalling and output handling it was easier to have each of these dialogs package their input along with a delegate to the underlying library call to a single method. This method would execute the call (via DynamicInvoke()) and return the results to the caller.

Turns out that this "wrapper around DynamicInvoke()" ended up being a natural place for adding context to error message. This context makes it much easier to debug when you receive a screenshot of the exception message from a client in the field.

A final trick was to make use of Visual Inheritance. Since so many of the menu elements open a dialog window, get input from the user, execute 1 or more steps, etc... some of this functionality was combined into a base form from which all other dialogs derived. The base form took care of positioning the OK and Cancel buttons, a descriptive paragraph near the top of the form and incorporating progress in the status bar of the dialog window. Each of the 40 - 60 dialog windows then derived from the base form.

Calling methods on an ActiveX control from Managed Code/.NET

.NET 2.0 introduced BackgroundWorker. I have come to love and rely on this little gem to maintain UI responsiveness while a long running task executes in the background.

What happens if the long running task that needs to execute in the background must run in a Single Threaded Apartment? This might happen if you're doing image processing and the library that handles the heavy lifting is an ActiveX control.

Since it has a visual representation it must run in a Single Threaded Apartment. But BackgroundWorker does it's background work on a thread that does not run in a Single Threaded Apartment.

To address this I wrote a shameless rip of BackgroundWorker. The only significant difference is that it does its background work on a thread that runs in a single threaded apartment.

Hopefully the .NET devs won't mind. Imitation is the sincerest form of flattery.

A colorized and formatted version of the code below for StaBackgroundWorker can be found here.
/// <summary>    
/// Similar to BackgroundWorker except that it does its work on a Single Threaded Apartment thread.
/// </summary>
public class StaBackgroundWorker
{
public event System.ComponentModel.DoWorkEventHandler DoWork;
public event System.ComponentModel.ProgressChangedEventHandler ProgressChanged;
public event System.ComponentModel.RunWorkerCompletedEventHandler RunWorkerCompleted;
private Control creatorControl;
public StaBackgroundWorker(Control creatorControl)
{
this.creatorControl = creatorControl;
}

public void RunWorkerAsync()
{
RunWorkerAsync(null);
}

public void RunWorkerAsync(object userState)
{
Thread staThread = new Thread(new ParameterizedThreadStart(RunWorkerAsyncThreadFunc));
staThread.SetApartmentState(ApartmentState.STA);
staThread.Start(userState);
}

private void RunWorkerAsyncThreadFunc(object userState)
{
DoWorkEventArgs doWorkEventArgs = new DoWorkEventArgs(userState);
Exception doWorkException = null;

try
{
OnDoWork(doWorkEventArgs);
}
catch (Exception ex)
{
doWorkException = ex;
}

RunWorkerCompletedEventArgs workerCompletedEventArgs =
new RunWorkerCompletedEventArgs(doWorkEventArgs.Result, doWorkException, doWorkEventArgs.Cancel);

creatorControl.Invoke(new MethodInvoker(delegate() { OnRunWorkerCompleted(workerCompletedEventArgs); }));
}

protected virtual void OnDoWork(DoWorkEventArgs e)
{
if (DoWork != null)
DoWork(this, e);
}

private bool cancellationPending;
public bool CancellationPending
{
get { return cancellationPending; }
}

public void CancelAsync()
{
cancellationPending = true;
}

public void ReportProgress(int percentComplete, object userState)
{
ProgressChangedEventArgs e = new ProgressChangedEventArgs(percentComplete, userState); // marshal this call onto the thread that created the control that created us
creatorControl.Invoke(new MethodInvoker(delegate() { OnProgressChanged(e); }));
}

protected virtual void OnProgressChanged(ProgressChangedEventArgs e)
{
if (ProgressChanged != null)
ProgressChanged(this, e);
}

protected virtual void OnRunWorkerCompleted(RunWorkerCompletedEventArgs e)
{
if (RunWorkerCompleted != null)
RunWorkerCompleted(this, e);
}
}



Hackers, Sculptors, Instincts and Design

Although there are many things that a Hacker does well design is not one of them.

The key question is, why not?

Partly it's because design doesn't lend itself to the Hacker style of working. For one, design is explicitly creational. The Hacker, on the other hand, enjoys probing an existing system for faults. The goal of design is to create systems that produce the desired output without those faults. Since the job is creating the system there's no system to probe (yet), so the Hacker's favorite activity has nothing on which to operate.

For another, design often involves choosing what to avoid. Put another way, sometimes good design is about choosing what to not do. Is it worth optimizing a given process or is it better to go with the naive implementation? The Hackers instincts run so strongly counter to choosing to not do something that he will often not recognize that such a choice is at hand (let alone make that choice). The consequences of his "hack" live on in infamy until it's undone.

During the design process you inevitably have to make choices. Do you optimize for space or speed? Do you design for caller simplicity or callee simplicity?

When the Hacker encounters such a decision his instincts work against making the design tradeoff in exchange for a more coherent system. The Hacker specializes in discovering system faults and working around them no matter how much the workaround is at odds with the manifest intent in the design of the system. The choice, to the Hacker, appears to be a problem in need of fixing and he's got tried and true methods for fixing problems.

The Sculptor is less likely to see the choice as a problem requiring a workaround. To him it's a chance to make the right tradeoff for the particular problem the system is intended to solve. To the Sculptor it goes without saying that all systems have limitations. He's more interested in having the system do whatever it was intended to do well than in having it do anything that can be done.

For the Sculptor the discovery of a system fault is somewhat less pleasant than it is to the Hacker. The system fault might mean somewhere along the line he made a bad design choice. Many design choices, once made, are very hard (or impossible) to undo once a system is in use. The discovery of a bad design choice is also unpleasant to the Sculptor because he enjoys making good design choices.

Sometimes it isn't obvious which tradeoff is the correct one. This is often a clue that the Sculptor doesn't understand some aspect of the problem domain. Ideally he'll be able to get a better understanding; either by talking with the client, referring to existing documentation, Wikipedia, reference books, etc... Sometimes this isn't possible - the client may not be available, there may be no existing documentation, nothing else to go on. The Sculptor has to make a call. However, because the Sculptor has gotten into the habit of getting to know more about the underlying problem domain, there's a good chance that he's run across something in the problem domain that points to the right decision.

Sometimes the Sculptor has a pretty good idea of the relative costs of the choice but doesn't know how significant either decision is to the client. A choice that's many times more expensive can often be discarded because it turns out that the client is more than happy with either result. The longer the Sculptor has been sculpting the better he gets at recognizing these "decision moments".

For all these reasons, the Hacker's instincts work against good design. Where good design is found there isn't as much need for hacks. Changes to existing systems becomes less about probing, trial and error or making the system work in ways it wasn't intended and more about finding the right pieces in the existing system to assemble or extend just-so to add the needed functionality.

As a code base increases in size good design can mean the difference between shipping a new, nearly bug free feature in 6 weeks instead of 6 months. It can mean the difference between a new employee becoming productive in a few weeks instead of a few months. The systems tend to survive and thrive long after the designer is no longer working on them.

Systems designed by Hackers, on the other hand, are plagued by bugs. Some easily repeatable, some seemingly non-deterministic. They're fixed quickly but seem to crop up again a few months later. The Hacker isn't (usually) intentionally creating a buggy system so that he has something to do; it's just that he isn't as good at design. He doesn't enjoy it nearly as much as he enjoys hacking and so puts commensurately less effort into design.

Systems designed by Hackers tend to accumulate hacks at several levels. At the level of the user interface there may be major discrepancies both internally and with respect to other, similar products. This is the manifestation of the Hacker's instinct to hack the system. He may have come across a user interface need that wasn't directly met by the available user interface widgets. So he baked something entirely from scratch. Ironically, the desire to hack the system often acts as a disincentive to taking advantage of the built-in functionality in a given system; why bother to figure out what's available when all you need is enough to start and you'll hack out the rest?

At the level of coding constructs systems designed by Hackers will often have a lot of copy/paste re-use. Somewhat more experienced hackers may make use of libraries but they'll often have strange quirks. Maybe they throw lots of exceptions during execution. There will be lots of exception handling, even in places where it doesn't make sense. The Hacker may think of this as "defensive programming" but in reality it's another manifestation of the Hacker instinct: try a bunch of things and hope something works. Oddly enough, the Hacker is often proud that he doesn't really understand exactly why his fix works as long as it appears to work (most of the time).

Hacker vs Sculptor Redux

In the previous post on the dichotomy between the 2 dominant modes of creating software I labeled these modes "The Hacker" vs "The Professional". After thinking about this it strikes me that it unintentionally implies that Hackers can't be professional. So the nomenclature needs some refactoring.

Hackers can be, and often are, professionals. For example, security researchers, those that discover (and sometimes fix) exploits in existing systems, must spend a lot of their time hacking to be successful in their job.

Quality Assurance is another profession that lends itself to the Hacker mindset. QA Engineers have to probe a system in expected and unexpected ways to find out whether it performs as intended.

As a final piece of evidence that I believe that the Hacker mindset is compatible with professionalism, articles such as "Under the Hood: Apple iPhone 3G exposed" clearly require a significant degree of technical expertise, discipline, knowledge of standards and standard techniques all of which are the hallmarks of a profession.

The Hacker vs Sculptor split isn't limited to Software Development. If it, as I believe, is truly an expression of fundamental personality preferences, then it isn't suprising that other professions find practitioners aligning along similar boundaries.

Take the music business. Many years ago a friend of mine asked, "So are you a technician that has an ear for music or a musician that knows how to push a few buttons?" He was an aspiring music producer who placed himself squarely in the former camp.

Back in the domain of software, the founder of PlentyOfFish.com evidences both his own hacker tendencies and the existence of the dichotomy when he says, "I was able to go in and see where errors were and fix them, which no one else could do. Everyone else could program, but didn't particularly know how to fix things. I was not particularly great at building stuff but I could fix things and make them work really, really well."

During a recent blog episode the commentators (Joel Spolsky and Jeff Atwood) discuss another expression of this dichotomy when talking about the differences between a good tester and a good programmer. Perhaps more importantly Joel alludes to the significance of the issue: being a good programmer probably means that you're not going to be a very good tester.

In the same vein, being a good Hacker probably means you will not be a good Sculptor. By Sculptor I mean designer of libraries, interfaces, systems and the like. The Hacker's instincts are very similar to the QA Engineer's instincts in that he is far more excited by and interested in how, where and when the system breaks.

The Hacker will gravitate towards this "quest for system faults" even when it doesn't directly bear on the task he finds before him. He genuinely enjoys it. He basically can't help it. Give a Hacker a system that isn't working and he is thrilled at the chance to get it to work, by hook or by crook.

Next time we'll focus on the Sculptor and what it is that makes him tick.

Visual Studio Regular Expressions and C Preprocessor Macros

Ever run across a need to replace a series of macros (or function calls for that matter) with something almost identical except for a miniscule change?

For example, suppose you've got a debugging macro that takes a format string followed by format variables. Suppose also that this macro doesn't append newlines to the format string. After a while it becomes nearly impossible to distinguish one log message from another. I ask you to suspend disbelief with regard to the obvious question "Why didn't the programmer put a newline at the end of each log message?"

Visual Studio supports regular expressions with a few syntactically small but semantically significant differences from PERL5 regular expressions. PERL5 uses parenthesis both to group/alternate AND capture subexpressions. Visual Studio supports these concepts with distinct operators: parenthesis for grouping/alternation and braces {} for subexpression capture. It refers to subexpressions as tagged expressions.

Anyway, to replace

MYDBGMACRO("the value of x=%d", x);

with

MYDBGMACROS("the value of x=%d\n", x);

using Visual Studio's "Find and Replace" dialog, the find expression is:

{DBGMSG2\(:b*"}{.*}{".*\);}

and the replace expression is:

\1\2\\n\3

That replace expression is a little tricky. The goal is to embed a newline in the format string without splitting up the line in the source code. Only the backslash needs to be escaped.