Tuesday, September 27, 2005

When Should I Have Learned This?

A few days ago I read a blog entry by chromatic about the kind of things that people learn in Computer Science programs these days. He was talking about the difference between theoretical computer science and the kind of skills you need to make it as a professional programmer, two things that are somewhat related but nowhere near identical. That dichotomy is interesting, but it's not what I want to talk about today. I want to talk about the million and one bits of information that have come out of the CS world but that for one reason or another you don't really learn in a CS degree.

Every CS major back at RPI had to write an implementation of strstr at some point, but I don't remember ever learning about better ways to do it than the naive algorithm. At some point between then and now I heard the term "Boyer-Moore Algorithm", but until today I never actually read the paper and saw how it worked. If I hadn't been poking around in the mod_include source code a few weeks ago I'd never have heard of a BNDM search algorithm. Both of these are really cool ways to do a faster string search, but even with my CS degree I'd never heard of them. Fortunately, I tend to hang out in the kind of places you hear about such things, but do we really want people to learn about these sort of things by accident?

How about techniques for multithreaded programming? If I hadn't read the documentation on the Boost threads library I wouldn't have learned the trick about figuring out lock ordering based on the address of the two mutexes, and if I hadn't been following the writing of Herb Sutter I wouldn't have realized exactly how many ways that solution can break down. If I hadn't read the FreeBSD mailing lists religiously for years I wouldn't have learned about the importance of lock ordering at all, despite the fact that I recall at least two classes back in school that taught the basics of multithreading. This isn't like the string searching stuff, where we're just talking about getting the answer faster, this is about how to write multithreaded programs that actually work at all.

If there's this much more to the topics that I did learn in school, what about the topics I never learned there? That's the thing that really worries me. It seems like it's way too easy to think you're doing things the right way and to be completely and totally wrong. Where should programmers be expected to learn this sort of thing? The thing that really worries me is that most of them just aren't learning them, and as a result are going to spend most of their careers screwing up stuff that collectively we've already figured out, but nobody got around to telling them about it.

Tuesday, September 20, 2005

Thoughts on Threading

So this week I've been thinking a lot about threading.

Specifically I've been thinking about large, complex multithreaded servers that are difficult to write, to test, and most importantly to modify later on, especially if you are not the person who actually designed and implemented them in the first place.

Among the better papers I've seen on the subject is "Reasoning about SMP in FreeBSD", by Jeffrey Hsu, who did a lot of work on locking the FreeBSD network stack as part of it's SMP project. Interestingly enough, Hsu is now working on DragonFlyBSD, which has taken a rather different approach to its multithreading than FreeBSD did. Makes you wonder if the more traditional techniques (fine grained mutexing, a-la the Solaris kernel) used in FreeBSD are really the right way to go...

Go read the paper, it's good.

Anyway, the big idea of that paper is that to effectively think about a multithreaded system you really need to understand what it's doing. You can't effectively add locking to one part of the system without totally understanding all the interactions it has with the rest of the system. You really need to see the big picture in order to know how to make the individual decisions.

I've been having trouble seeing the big picture, that's the problem.

Planning Holiday Cheer

So I know I've already complained about how much it sucks to have to buy plane tickets and fly across the country for all those family gatherings and social events I'm expected to be at back on the east coast...

But I haven't had to do the "travel over the holidays" thing yet, so I'm sure that all my complaining is for naught, because I haven't experienced real pain yet...

Don't worry though! It's coming! I just booked my flight home for the holidays. It turns out that $COMPANY has a bunch of "Shutdown Day" days off between XMas and New Years, so I'll be home from December 23rd through January 2nd. Judging from the price Jet Blue charges, and the fact that every reasonably priced rental car is taken already, I'm sure there'll be a terrifyingly large number of people trying to get someplace other than where they live, just like me. It should be loads of fun. Now I know why Rob J doesn't travel during the holidays...

Anyway, I guess that means I'll have to figure out where the cool kids back home are going for a new year's party...

Saturday, September 17, 2005

Fun with Email

So earlier this week I ran across Matt Sergeant's O'Reillynet article about Qpsmtpd.

Now I've known about Qpsmtpd for a while now, I saw a great talk about it at OSCON a few years back, and I know the ASF uses it for their mail, so I'm not exactly new to the concepts here, but one thing in the article struck me as just too damn cool. You see, while there are special plugins for Qpsmtpd that let it deliver mail to Qmail or Postfix, the simplest way to integrate it with your existing system is just to run your normal SMTP server on an alternate port, and have Qpsmtpd do all the fancy filtering and spam blocking, then just forward it on via, you guessed it, SMTP.

Back in the Apache world, some of the HTTPD developers have been working on mod_smtpd, which lets you make the Apache HTTPD speak SMTP.

In the open source world good engineers borrow ideas and great engineers steal them, so much of the mod_smtpd design is based on Qpsmtpd. Ironically enough, the Qpsmtpd design was inspired by the modular design of the Apache HTTPD server, so now the cycle is complete...

Now mod_smtpd is just getting to the point where it's able to do useful things, but it still doesn't have a queue plugin that lets you forward mail via SMTP.

So I wrote one this morning.

It's pretty bare bones, with a really basic SMTP implementation under the hood, but it is enough for my mod_smtpd server to be able to forward mail to my exisitng Postfix server, and that's the whole point. Grab the patch here, if you're interested, although honestly, if you're interested enough in mod_smtpd at this point that you'd actually want the code, you're probably on the mailing list already ;-)

Tuesday, September 6, 2005

APR-Template 0.0.1

I just cut the first release of APR-Template, my Apache Portable Runtime based templating system. Since I already wrote a perfectly good README file that explains what it is, I'll just quote it here:

APR-Template is a minimalist template library implemented on top of APR and APR-Util. It's designed to provide a lightweight templating solution that fits easily into an Apache HTTPD module or other APR based program.

The template language looks like this:

<html>
<head>
<title>[% print title %]</title>
</head>
<body>
[% if add_header %]<hr>[% end %]

<h1>[% print title %]</h1>

<ul>
[% for e in elements %]
<li>[% print e %]</li>
[% end %]
</ul>

[% if add_footer %]<hr>[% end %]
</html>

Which produces about what you'd expect. If you've used Template-Toolkit or EZT you'll find APR-Template to be similar, although the feature set is quite stripped down, so if you find yourself trying to do a whole lot in your template it's a sign that you either need to put that logic into your C code or find a different template engine.

For examples of how to use APR-Template in a program please refer to the tmpl command line tool (in src/cmdline/tmpl.c) and the mod_template_example apache module (in examples/mod_template_example).

There's a bunch of stuff I still want to do with it (see the TODO file in the tarball for details), but the current version has just enough functionality to be useful, so I decided it was a good point to cut a release. Get it while it's hot: http://electricjellyfish.net/garrett/apr-template/apr-template-0.0.1.tar.gz

Monday, September 5, 2005

Illogical Travel Plans

So Kim posted a couple of pictures from our fraternity in her LiveJournal today, which of course resulted in me spending way too long going through the online photo albums looking at the past 8 or 9 years worth of Psi U history.

This had predictable results. I've been overwhelmed with a desire to play quarters and flick bottle caps at the light fixture in the living room. Oh, and to see some old friends.

So I just bought my plane tickets for 3 to 3. I'll be taking the totally insane route, flying out late friday night and arriving at about noon on saturday (I'll miss the alumni association meeting, bonus!), then a little more than 24 hours later I'll fly out again. It seems likely that I'll be pretty out of it the whole time, but hopefully I'll be able to get some sleep on the plane so I can at least remain awake for the 12 scheduled hours of party ;-)

Anyway, I just felt the need to mention it, on the off chance that anyone else was planning on not going this year. I mean if I can fly across the whole damn country to be there doesn't that make your excuse seem pretty lame?

Sunday, September 4, 2005

Fun With Templates

My friend Greg wrote this template engine (ezt.py) in Python. It's small (less than 600 lines of code), and useful, and under a very permissive license, and thus has made its way into a variety of different programs (the Subversion build system, ViewCVS, Edna, the new ASF workflow tool, probably lots more I don't know about). If you're writing Python code and need a template engine, and you don't feel like embedding one of the heavier weight solutions, you should really give it a shot.

Lately though, I've been playing around with Apache modules a lot. I guess that's one of the dangers of spending so much time sitting 10 feet from Paul.

Considering that Apache happens to be a web server, lots of Apache modules expose some information via web pages, either diagnostics, control panels of some sort, or even the primary user interface of the program.

There isn't really a C equivalent of EZT though, so you either end up embedding a scripting language of some sort and using a template engine implemented in that, or you use something like clearsilver, which is both ugly and overkill for the kind of thing we're talking about here. Or, more likely, you end up doing what most Apache modules do and just spit out html via the moral equivalent of printf.

I think it's pretty sad that we're this far into the evolution of the web and some of the core portions of the most commonly used web server on the planet are generating their user interface via printf...

So I'm working on a better solution.

It's called APR-Template, because I can't come up with a better name, and it's similar in spirit to EZT, but implemented in C using the Apache Portable Runtime. It even writes output to an APR bucket brigade, so it's really easy to plug it into an Apache module. So far it has conditionals, loops, scalars, arrays, hashes, and weighs in at just under 1000 lines of C code.

Look for the first public release real soon now, as soon as I clean up some of the public APIs a little bit and write some more documentation.