Tuesday, September 27, 2005

When Should I Have Learned This?

A few days ago I read a blog entry by chromatic about the kind of things that people learn in Computer Science programs these days. He was talking about the difference between theoretical computer science and the kind of skills you need to make it as a professional programmer, two things that are somewhat related but nowhere near identical. That dichotomy is interesting, but it's not what I want to talk about today. I want to talk about the million and one bits of information that have come out of the CS world but that for one reason or another you don't really learn in a CS degree.

Every CS major back at RPI had to write an implementation of strstr at some point, but I don't remember ever learning about better ways to do it than the naive algorithm. At some point between then and now I heard the term "Boyer-Moore Algorithm", but until today I never actually read the paper and saw how it worked. If I hadn't been poking around in the mod_include source code a few weeks ago I'd never have heard of a BNDM search algorithm. Both of these are really cool ways to do a faster string search, but even with my CS degree I'd never heard of them. Fortunately, I tend to hang out in the kind of places you hear about such things, but do we really want people to learn about these sort of things by accident?

How about techniques for multithreaded programming? If I hadn't read the documentation on the Boost threads library I wouldn't have learned the trick about figuring out lock ordering based on the address of the two mutexes, and if I hadn't been following the writing of Herb Sutter I wouldn't have realized exactly how many ways that solution can break down. If I hadn't read the FreeBSD mailing lists religiously for years I wouldn't have learned about the importance of lock ordering at all, despite the fact that I recall at least two classes back in school that taught the basics of multithreading. This isn't like the string searching stuff, where we're just talking about getting the answer faster, this is about how to write multithreaded programs that actually work at all.

If there's this much more to the topics that I did learn in school, what about the topics I never learned there? That's the thing that really worries me. It seems like it's way too easy to think you're doing things the right way and to be completely and totally wrong. Where should programmers be expected to learn this sort of thing? The thing that really worries me is that most of them just aren't learning them, and as a result are going to spend most of their careers screwing up stuff that collectively we've already figured out, but nobody got around to telling them about it.


  1. I can't remember where it was first published, but Sutter's (in Win32 terms) InterlockedIncrement String implementation that didn't use CriticalSections blew a lot of our minds.
    Herb's a smart guy.
    BTW, I think programming is folk lore. That's why I blog about stuff that I can't find documented clearly on the net. The truth of the matter is, even 10 years ago when I was in school, CS depts were pretty chaotic.
    My favorite CS prof was great with language theory, but didn't understand how the stack was implemented on the hardware we were using.

  2. Exactly.
    The complete disconnect between CS faculty and reality is a good starting point to address. I was fortunate to be in a Computer Engineering (CPE) program that tackled both hardware and software and how the two fit together. The typical CS graduate doesn't know what is possible with modern hardware as they are abstracted away from it by 4 layers of APIs dumbed down to almost basic Logo commands.
    Another is the simple fact that education is getting more "sales" oriented every year; meaning, you make it "too hard" and your enrollment drops off. Enrollment drops off and your revenue drops off.
    In my assembly class we were asked to put a full user interface on a matrix multiply routine. In my C class, we had to create an X-Window based oscope emulator. My sr. design project was a multi-threaded video-on-demand client and server for OS/2 (in 1994 no less). Those projects are where i learned the most; where concepts met reality.

  3. If it helps any, I got taught about Boyer-Moore when I did my CS degree. And thread locking. That's what happens when they do several dedicated courses on algorithms and operating systems.

  4. This pertains to pretty much every major or career. You just can't possibly learn everything you'll ever need to know in 4 years. The most you can hope for is to get a good foundation of knowledge and the problem solving, researching, and networking skills that will actually get you somewhere in "the real world." I learned about all kinds of stuff in college that I thought I understood, but I didn't truly understand it until I had to apply it. I've learned more by troubleshooting my own work than I did by studying for any exam. Your market value goes up with the experience you accumulate. Who'd want to enter the working world knowing that their value would never increase? How depressing.

  5. I guess what really bothers me is that it seems like the line between "stuff I should expect someone to get from a CS degree" and "stuff I should expect someone to learn later on" is too low. There are a number of things that I really consider basic that I don't remember being covered at all back in school, and I think they should be. Assuming people will pick them up as they go just doesn't seem to work in a lot of cases. It's fine for the ones that are motivated to go out and learn (myself included), but for the majority of the programmers out there it just doesn't cut it, and that's a problem.

  6. Probably the best thing that Michigan ever did for me was to give me pretty horrible undergrauate expereince. By this I mean that we were required to produce a fairly decent amount of project/course work, but were given little to no good instruction from professors. The bottom line, was that many days were an exercise in self education. Give a man a fish, teach (force) a man to fish.

  7. Computer Science is not Software Engineering. I've been saying this for years, and got into arguments several times at $FORMER_COMPANY about "the theory gives you the basics to learn what you need!". Wrong wrong wrong. As you have pointed out, "right-place, right-time" learning is incredibly dangerous if the goal is proliferation of good software development practices (and good software developers). I took a course as an undergrad CS411 - Programming Languages. We didn't do any programming. None. But I nearly failed out of school over Context-Free Grammars and Turing Machine equivalence, neither of which helped me write a single line of code.