A few days ago I read a blog entry by chromatic about the kind of things that people learn in Computer Science programs these days. He was talking about the difference between theoretical computer science and the kind of skills you need to make it as a professional programmer, two things that are somewhat related but nowhere near identical. That dichotomy is interesting, but it's not what I want to talk about today. I want to talk about the million and one bits of information that have come out of the CS world but that for one reason or another you don't really learn in a CS degree.
Every CS major back at RPI had to write an implementation of strstr at some point, but I don't remember ever learning about better ways to do it than the naive algorithm. At some point between then and now I heard the term "Boyer-Moore Algorithm", but until today I never actually read the paper and saw how it worked. If I hadn't been poking around in the mod_include source code a few weeks ago I'd never have heard of a BNDM search algorithm. Both of these are really cool ways to do a faster string search, but even with my CS degree I'd never heard of them. Fortunately, I tend to hang out in the kind of places you hear about such things, but do we really want people to learn about these sort of things by accident?
How about techniques for multithreaded programming? If I hadn't read the documentation on the Boost threads library I wouldn't have learned the trick about figuring out lock ordering based on the address of the two mutexes, and if I hadn't been following the writing of Herb Sutter I wouldn't have realized exactly how many ways that solution can break down. If I hadn't read the FreeBSD mailing lists religiously for years I wouldn't have learned about the importance of lock ordering at all, despite the fact that I recall at least two classes back in school that taught the basics of multithreading. This isn't like the string searching stuff, where we're just talking about getting the answer faster, this is about how to write multithreaded programs that actually work at all.
If there's this much more to the topics that I did learn in school, what about the topics I never learned there? That's the thing that really worries me. It seems like it's way too easy to think you're doing things the right way and to be completely and totally wrong. Where should programmers be expected to learn this sort of thing? The thing that really worries me is that most of them just aren't learning them, and as a result are going to spend most of their careers screwing up stuff that collectively we've already figured out, but nobody got around to telling them about it.