The beginning of the end of the single-processor era

I came across a quote on CNet that stuck with me yesterday:

It’s hard to see how there’s room for single-core processors when prices for nearly half of AMD’s dual-core Athlon 64 X2 chips have crept well below the $100 mark.

I think that that this sentiment is especially true nowadays (at least for conventional PC-style computers – not counting embedded things). Multiprocessor (at least pseudo-multiprocessor, in the form of Intel’s HyperThreading) has been available on end-user computers for some time now. Furthermore, full multiprocessor, in terms of multi-core chips, is now mainstream. What I mean by that is that by now, most of that computers you’ll get from Dell, Best Buy, and the likes will be MP, whether via HyperThreading or multi-core.

To give you an idea, I recently got a 4-way server (a single quad core chip) recently, for ~$2300 or so (though it was also reasonably equipped other than in the CPU department). At work, we got an 8-way box (2x dual core chips) for under under ~$3000 or so as well, for running VMs for our quality assurance department. Just a few years ago, getting an 8-way box “just like that” would have been unheard of (and ridiculously expensive), and yet here we are, with medium-level servers that Dell ships coming with that kind of multiprocessing “out of the box”.

Even laptops are coming with multicore chips in today’s day and age, and laptops have historically not been exactly performance leaders due to size, weight, and battery life constraints. All but the most entry-level laptops Dell ships nowadays are dual core, for instance (and this is hardly limited to Dell either; Apple is shipping dual-core Intel Macs as well for their laptop systems, and has been for some time in fact.)

Microsoft seems to have recognized this as well; for instance, there is no single processor kernel shipping with Windows Vista, Windows Server 2008, or future Windows versions. That doesn’t mean that Windows doesn’t support single processor systems, but just that there is no longer an optimized single processor kernel (e.g. replacing spinlocks with a simple KeRaiseIrql(DISPATCH_LEVEL) call) anymore. The reason is that for new systems, of which are expected to be the vast majority of Vista/Server 2008 installs, multiprocessing capability is just so common that it’s not worth maintaining a separate kernel and HAL just for the odd single processor system’s benefit anymore.

What all this means is that if as developers, you haven’t been really paying attention to the multiprocessor scene, now’s the time to start – it’s a good bet that within a few years, even on very low end systems, single processor boxes are going to become very rare. For intensive applications, the capability to take advantage of MP is going to start being a defining point now, especially as chip makers have realized that they can’t just indefinitely increase clock rates and have accordingly began to transition to multiprocessing as an alternative way to increase performance.

Microsoft isn’t the only company that’s taking notice of MP becoming mainstream, either. For instance, VMware now fully supports multiprocessor virtual machines (even on its free VMware Server product), as a way to boost performance on machines with true multiprocessing capability. (And to their credit, it’s actually not half-bad as long as you aren’t loading the processors down completely, at which point it seems to turn into a slowdown – perhaps due to VMs competing with eachother for scheduling while waiting on spinlocks, though I didn’t dig in deeper.)

(Sorry if I sound a bit like Steve when talking about MP, but it really is here, now, and now’s the time to start modifying your programs to take advantage of it. That’s not to say that we’re about to see 100+ core computers becoming mainstream tommorow, but small-scale multiprocessing is very rapidly becoming the standard in all but the most low cost systems.)

5 Responses to “The beginning of the end of the single-processor era”

  1. flet says:

    An unpleasant “feature” of this transition to true multiprocessing as opposed to multithreading on a single core is that any latent bugs in multithreaded code tend to become much more serious.

    That, and the locking and synchronizing one must do (in a low level language) becomes that much more critical/onerous because now two or more threads may really really run at the same time (to say nothing of how these threads may “see” various supposedly global cpu/io state differently due to caching).

    I suppose the only thing worse (from a multithreaded program reliability viewpoint) would be if the Itanium had become the standard instead of the x86-64, with it’s weak cache coherency model…

  2. Skywing says:

    That’s true to a certain extent, although I would submit that for most user mode code, you would probably be running into synchronization bugs on single proc machines anyway (albeit not as quickly as on a true multiprocessor system).

    The IA64 memory model definitely throws more things for a loop though, I certainly agree with that. Having not written IA64-specific code myself (but having read documentation on it), it certainly seems like that architecture has gotten to the point where most people are just going to get things fundamentally wrong in terms of memory coherency. Even on the comparatively lax x86 model, it’s *still* tricky in some cases to get things perfectly correct. is worth a read on that point – at least a couple of the examples in that paper would have tripped me up. The architecture outlined by that paper is supposed to combine IA64 and IA32 in terms of memory models into one that will work on both.

  3. dispensa says:

    I don’t see why it has to be such a hard thing; if you use synch APIs, you get memory coherence for free. Interlocked APIs are full memory barriers, and if you want a lighter-weight barrier, there are intrinsics and SDK functions for acquire/release only.

    IA-64 is my dark horse pick in the CPU race, unless it just gets totally swamped. Moving intelligence into the compiler (and getting rid of a bunch of “guessing” by the CPU at runtime) seems like an idea whose time has come.

  4. […] Ken just wrote about this issue a couple of weeks […]

  5. […] Ken covered this a while ago regarding a similar decision by […]