Taskbar not Grouping Buttons

Jump to Workaround

I can’t recall exactly when but at some point in the past year my Taskbar stopped grouping similar buttons.

In the past few months I’ve reinstalled XP at least once.  I also started docking my Taskbar against the left edge of the screen.  Being an efficiency junkie it occurred to me that most monitors have more horizontal space than vertical.  Docking the taskbar to the left seemed a natural response to this. :)

But undoing this micro-space-optimization didn’t correct the problem.  Similar Taskbar buttons were still not being grouped together.

For a while I turned on “open each explorer window in a separate process” to help debug a rogue shell extension; an in-process server that didn’t ever seem to want to unload.  Since that was fixed a while ago I disabled “open each explorer window in a seperate process”.  Alas, this had no effect.  Similar Taskbar buttons were still crowding out the Taskbar instead of grouping.

This might not be big deal for most people but I like to have a lot of windows open at once.  20 is a minimum, 30 – 40 is pretty typical.  Once the existing Taskbar space is exhausted the button size gets halved.  When docked to the left edge of the screen this makes it all but impossible to determine what any given Taskbar button represents (is that Windows Explorer opened to c:\documents and settings\<username>\Local Settings\Temp ?  Or c:\documents and settings\<username>\My Documents\someproj\somelogs.txt ?). 

Clearly this is monumentally important stuff.  Fortunately there’s a workaround.  It turns out that the Taskbar, by default, tries to figure out when it should group similar buttons based partly on estimates of how much space a button needs.  Perhaps this estimate doesn’t work well with my non-standard, docked-to-the-left-edge, Taskbar.  Serves me right for using a non-standard configuration.  Still, there’s a registry fix documented here that allows you to override this and force grouping whenever 3 or more instances of a Taskbar button appear.

Welcome back – Taskbar buttons, let’s group hug!

Image Processing in the Spatial Domain

Although there’s a rich set of techniques for processing images in the frequency domain, spatial domain techniques are in widespread use.

For one thing, they’re easier to implement.

Spatial processing techniques can be broadly classified as either point processing or mask processing.

Point processing modifies the value of each pixel (sometimes this is called a gray level for 8 bit images) based solely on the location of the pixel.  Examples include:

  • Negation - each pixel level is subtracted from the max pixel level.
  • Contrast Stretching - a function is used to enhance/intensify certain regions of gray level over others.
  • Thresholding – Pixels with values exceeding the threshold are transformed into the maximum value, all others are zeroed out.

Mask processing modifies the value of each pixel based on the values of all pixels in the neighborhood surrounding the pixel.  The neighborhood is usually a square or rectangle of odd dimensions.

Image Averaging is a kind of mask processing whereby each pixel is replaced by a weighted average of its neighbors.  This kind of processing is used to reduce some kinds of noise.  The downside is that it tends to blur sharp edges; usually sharp edges represent features of interest.

Since border pixels can’t be surrounded entirely by the mask (aka window or filter) the only way to get a perfectly filtered image is to accept a slightly smaller image as output.  Unfortunately this is usually unacceptable so various methods for padding are employed.

Averaging tends to suppress features that are smaller than the size of the mask.

Order Statistic Filters are non-linear filters that determine the value of a pixel based on some order statistic about its neighboring pixels.  The median is an order statistic (middle-most in rank order).

The median filter tends to suppress salt-and-pepper like noise without the side effect of blurring sharp edges.

Convolution confusion

So suppose you have a function that you’d like to modify.  Maybe you don’t like the way the function treats certain inputs.  Maybe you wish the function emphasized some aspect of its input over other aspects.  Maybe you just don’t think the function is aesthetically pleasing.

So you modify the function.  One way to modify the function is to add a constant value at every point.  This is a linear modification.  Since the fourier transform of a function is based on integrating sinusoidal basis functions, and linearity is preserved in integration, you can make this modification in either the time domain OR the frequency domain.

Suppose adding a constant to the function at every point doesn’t modify the function to your liking.  It’s a pretty broad brush change.  Instead of enhancing a particular aspect of the function it, at least in the time domain, merely shifts the function.

Another way to modify the function would be to multiply it by a constant value.  This scaling will also survive the translation to the frequency domain because the Integral(a * f(x) dx) is equal to a * Integral(f(x) dx).  This modification is similar to the first modifcation (adding a constant) in that it’s not selective at all.  It’s very broad brush.

What you really want to be able to do is to modify the input function selectively.  That is, you’d like to use another function to modify the original function.

Enter convolution.  Through some rather heroic mathematics this wonderful operation allows you to use another function, call it h(x), to modify your original function, call it f(x), to produce your output function, call it g(x).

Convolution is a binary operation that is implemented by multiplying the Fourier transforms of the 2 operands.

If f(x) and h(x) are the original function and the modifying function in the time domain then F(u) and H(u) are their corresponding Fourier transforms (we call this the frequency domain).

f(x) convolve h(x) can be performed by multiplying F(u) * H(u) then taking the inverse Fourier transform to produce g(x) – the modified version of the original input.  H(u) has many names, one of which is the transfer function.

Once we’re in the frequency domain, we can make H(u) pretty much whatever we want.  The simplest non-trivial function would be one that multiples F(u) by 1 if it’s below a certain value and 0 if it’s above.  This is an ideal low pass filter.  By ideal it is meant that there is a sharp cutoff – the signal is passed entirely when inside the cutoff and not passed at all when outside.

A key point to keep in mind is that H(u) operates in the frequency domain.  That is, we’re modifying frequencies, which will ultimately produce a modification in the time domain.  I believe the art part of using the Fourier Transform, is figuring out how to modify the function in the frequency domain in a way that produces the desired result in the time domain.

So, for instance, we know that sharp edges in the time/spatial domain require high frequency components in the frequency domain.  To enhance sharp edges via convolution requires that we filter out low frequency components.  To smooth/blur an image (that is, reduce sharp edges) requires the exact opposite (filtering out high frequency components instead).

Sharing and Storing Visual Studio Help Favorites

Tucked away inside the wonderful Tools –> “Import and Export Settings” dialog is the ability to export you current Help Favorites.  You do have Visual Studio’s Help (aka MSDN) installed locally right? :)  There are thousands of precious milliseconds being wasted each time you click on an MSDN online link and wait for the page to reload.  With the local help the load time is perceptually instantaneous.

Help Favorites are one of the many settings that you can include in a settings export.  If it’s the only setting you include it’s just like exporting a list of favorites (bookmarks for you old-timers) to a file.

I think I’ll also end up using this to store sets of related help.  After a while my help favorites list becomes too long to be useful as a quick reference.  As a rule of thumb once a list has more than a dozen or so elements it takes longer to find the desired item in the list than to search for it.  Saving the help favorites from time to time and having them automatically indexed by Windows Search combines the best of both worlds.

Notes on Fourier Series

Stanford’s course on the Fourier Transform has, for the first 6 lectures, been almost entirely about Fourier Series.

Fourier Series can be used to represent any periodic phenomena.  Phenomena = functions in math, signals = engineering.

Periodic phenomena are broadly classified as either periodic in space (e.g., a ring) or periodic in time (e.g., a wave).  Periodicity arises from symmetry inherent in some property of the periodic phenomenon.

Prof Osgood is a genius.  His enthusiasm is surpassed only by his insight into mathematics.

The Fourier series is based on linear combinations of sine and cosine.  These are in turn based on the unit circle.  Mentioning this relationship to the unit circle because it isn’t the only basis for trigonometric functions.  For example, the Hyperbolic sine is based on a hyperbola.

Any periodic function can be expressed as a linear combination (a sum) of these trigonometric basis functions.  The hard part is figuring out the coefficients to apply to each linear component of the sum.

For mathematical convenience, the exponential form of the sine and cosine are often used when deriving the coefficients of the Fourier series for a particular function.  There’s a lot of calculus involved here but, at least up to lecture 7, it’s all pretty basic plug and chug integration and differentiation.  Despite that, it makes me yearn for a refresher in things like Integration by Parts.  Thank God for Wikipedia and Schaum’s Outlines!

Apparently the key breakthrough in the theoretical justification for Fourier Series occured when mathematicians gave up on trying to prove that the Fourier series converges exactly to the function being represented.  Instead, they were able to prove that the mean squared error (difference between the value of the function and its Fourier series) of the Fourier series for a given function, in the limit, approaches zero.

Sine and Cosine are both continuous and smooth (infinitely differentiable).  Because of this edges, which are either jump discontinuities or sharp changes in the derivative, require higher and higher frequency components to express.  I visualize this as higher and higher frequency sinusoids “bunching up” together to produce a sharp change in the value of the function at any given point.  The more such points in a function (e.g., the more edges), the more these high frequency sinusoids are needed to represent the function.

I believe it takes an infinite number of such high frequency components to perfectly reproduce any discontinuous function since the sum of a finite number of continuous functions is a continuous function.

In lecture 6 Prof Osgood formally bridges Fourier Transforms to Fourier Series.  The Fourier Transform is a limiting case of Fourier Series.  Limited in the sense that the phenomena need not be periodic (which means, mathematically, that the period tends to infinity).

Median of 5 numbers in 6 comparisons

Back in 2003 while studying for the Algorithms portion of the PhD Qualifying Examination in Computer Science at Georgia State, I ran across an algorithm so useful it cried out for implementation.

So, dredged from the archives of usenet, the code is below.  It’s vanilla C w/ preprocessor macros but should easily port to any other language.

// Change the type of pixelValue to suit your needs.
typedef int pixelValue ;

#define OPT_SORT(a,b) { if ((a)<(b)) OPT_SWAP((a),(b)); }
#define OPT_SWAP(a,b) { pixelValue temp=(a);(a)=(b);(b)=temp; }

/*-----------------------------------------------------------------------
* Function : opt_med5()
* In : A pointer to an array containing 5 pixel values
* Out : The median pixel value of the input array
* Job : Fast computation of the median of 5 pixel values in 6
comparisons.
* Notice : The input array is modified: partly sorted so that the
* middle element is the median.
* No bound checking to gain time, it is up to the caller
* to make sure arrays are allocated.
*---------------------------------------------------------------------*/
pixelValue opt_med5(pixelValue *p)
{
OPT_SORT(p[0],p[1]);
OPT_SORT(p[2],p[3]);

if ( p[0] < p[2] )
{
OPT_SWAP(p[0],p[2]);
OPT_SWAP(p[1],p[3]);
}

OPT_SORT(p[1], p[4]);

if ( p[1] > p[2] )
{
if ( p[2] > p[4] ) return p[2];
else return p[4];
}
else
{
if ( p[1] > p[3] ) return p[1];
else return p[3];
}
}

#undef OPT_SWAP
#undef OPT_SORT



RAID-0 vs a Standalone Hard Drive

Jump to the Charts - Jump to the Table

I’ve been looking for a way to speed up secondary storage access.  Over the past few months I’ve looked at Solid State Drives, faster RPM spin drives (e.g., Western Digital’s VelociRaptor), bigger on-board cache (16MB vs 8MB), support for Native Command Queuing via the Advanced Host Controller Interface (AHCI) and, lastly, RAID-0.

Solid State Drives would probably offer the fastest overall read times.  Since one of my goals has been to shorten the boot cycle this is clearly a win.  Unfortunately they’re still a bit too pricey.

On the spin drive front, faster RPMs (assuming I’m not interested in SCSI) usually means going for a drive that operates at either 7200 RPMs or 10,000 RPMs.  Western Digital’s VelociRaptor spins at 10,000 RPMs and blows away every other drive in its class in benchmarks.  It’s not too pricey either coming in at $160 for the 150GB model at the local Frys.

Native Command Queuing (NCQ) places requests in a queue (up to ~32 requests) and allows them to be re-ordered to minimize the amount of repositioning necessary by the drive heads to satisfy all of the requests.  Mainstream CPUs use a similar optimization but call it out-of-order execution.  NCQ provides the biggest benefit when the workload results in HD I/O requests that are widely distributed across the disk.  While I think this would probably increase overall throughput, even in a desktop scenario, I didn’t think it would provide nearly as much bang as RAID-0 striping.  Throw in regular fragmentation and the locality of reference principle and NCQ looks less compelling.

Which brings me to the scenario I opted to try (RAID-0).  Why RAID-0?  RAID-0 spreads the data across each of the drives in the volume.  In theory this should yield a linear increase in throughput since writes and reads can occur simultaneously, minus some overhead, across the drives.

Thanks to the wonderful Intel Matrix Storage technology RAID support is “baked into” the motherboard and natively supported by Windows Vista.  By natively supported I mean that Vista can be installed onto a RAID volume out-of-the-box (XP could do this but required the user to provide drivers during the setup process).

On my board, CTRL+I during the boot up sequence brings up the Intel Matrix Storage Manager Console.  This allows you to define RAID volumes.  My setup is as follows:

RAID-0 Volume 0 = 3 Western Digital 160GB Caviar Blue drives

RAID-0 Volume 1 = 2 160GB drives (1 Seagate, 1 WD)

OEM versions of the WD drives sell for $44 so this actually ends up being cheaper than a single 150GB VelociRaptor.  It also has more than 3 times the space (RAID-0).

The results?  See for yourself below.

Standalone System Drive (No RAID):

Standalone System Drive

Three-Way RAID-0 System Drive (32k stripe size):

Three-Way RAID-0 System Drive

I tend to create a separate data partition to minimize contention with the system cache and program loading.  So instead of throwing out the old system and data drives I combined them into a second RAID-0 volume and use that for data.

Standalone Data Drive (No RAID):

Standalone Data Drive

Two-Way RAID-0 Data Drive (128k stripe size):

Two-Way RAID-0 Data Drive

Unfortunately the axes on the screenshots were automatically scaled to different ranges.  The following table highlights the results.

  Throughput (MB/s) Access Time (ms) Burst Rate (MB/s) CPU Usage
Standalone System Drive 54.7 18.5 68.1 2.6
3-way RAID-0 System Drive 130.1 15.9 75.7 8.9

improvement

75.4 2.6 7.6 -6.3
         
Standalone Data Drive 56.6 17.7 61.3 2.2
2-way RAID-0 Data Drive 70.5 18.6 61 3.1

improvement

13.9 -0.9 -0.3 -0.9

For the system drive Three-Way RAID-0 is a clear winner except for CPU Usage but with 4 hyperthreaded cores who cares? :)

Two-Way RAID-0 improves throughput but barely pulls even with the standalone data drive.  This may be due in part to the differing stripe size (128k vs 32k), drive mismatch (the drives in the 3-way setup are identical, the drives in the 2-way setup are not) or other factors.

At 130 MB/s average throughput the Three-Way RAID-0 setup is pushing in the neighborhood of 1Gb/s.  Sata II has a theoretical bandwidth of 3Gb/s.  I wonder if a Nine-Way RAID-0 would be enough to nearly saturate the Sata II link while still being usable?…

As for data backup, I’m taking a few approaches.  A nightly file based backup to an external HD and Windows Live Sync to a laptop for certain key directories.  Windows Live Sync provides nearly real-time backup to on another machine but each directory has to be configured manually and has a limit of 20,000 entries.  The theoretically less reliable Three-Way RAID-0 System Drive gets imaged nightly to an external hard drive.  Since it’s a system drive it’s setup to be rebuildable from installation media so a loss is less critical than it would be for items on the data drive.

Learning the Fourier Transform in the 2st Century

Stanford has put several of its courses on YouTube.  MIT has done the same.  All free.  Absolutely amazing.

Since YouTube works on phones that are not the iPhone (e.g., devices running Windows Mobile) these courses can be listened to from practically anywhere.

First up, the wonderful Fourier Transform.

Concurrency: So Easy it’s Hard

There’s a seemingly perpetual undercurrent of buzz in the software world about simplifying concurrency (aka multithreading, parallelism, etc…). 

Since software makers are in the business of making software many of these solutions focus on simplifying the software involved in the implementation of concurrency.  For example, the next version of the .NET Framework, .net 4.0, adds parallel extensions.  Old mainstays such as PVM and MPI are extensions to C designed to simplify concurrency.  Java shipped with its own threading API.  There are many others that can be added to this list (e.g., Stackless Python).

I’d say that the problem of creating concurrency, that is, creating a system that depends on multiple sequences of instructions executing simultaneously, has been solved so well that it’s been solved too well.  It’s so easy to create concurrency that we can do it without realizing it (e.g., event driven programming).

Meanwhile, the difficulties associated with concurrency do not typically stem from lack of library (or syntax) support.  Concurrency is hard because it’s hard to model/design in way that is both correct and maintainable.

This is not, per se, a problem of software libraries or syntax.  It’s a problem of comprehension and communicating comprehension.  The difficulty of modeling concurrent processes is to be expected given that our most basic modeling tool , the human brain, is not very good at executing multiple complex tasks simultaneously.  Sequential thinking, for complex tasks, is so rooted in our psychology that it can be used to our advantage: if you don’t want to be distracted by random thoughts then focus on a single thought because “you can only focus on a single thing at a time”.

The solution, in my opinion, is to do what we always do when the number of mental widgets necessary to successfully handle a task exceeds the ability of most individual brains; come up with concurrency focused abstractions that gloss over as much non-concurrency related detail as possible while remaining useful.

In this case, since the problem is design and communication of design, the focus should be on improved notation.  We need notation that makes shared state leap off the page.  Notation that makes it clear when our designs may result in deadlocks.  Notation that calls attention to results that depend on the order in which multiple threads of execution complete (race conditions).  Perhaps most importantly we need notation that is widely used so that the complex orchestration of instruction streams we call concurrency can not only be created but modified, adapted, extended, etc… as well.