Wednesday, June 4, 2003

It's A Bug Hunt

so i tracked down a fun bug today, and i thought i'd share it, since the error is fairly generic and anyone coding in c++ could run into it.

so i'm working on a unit test framework (porting it actually), and like several other c++ unit test frameworks, there is a 'test registry' that's really a global object, and when you create a test you use a macro to add it to the global registry. the macro, under the hood, creates another global object, which in it's constructor calls a method of the registry to add your test to it.

a week ago, the test suite was working fine, i had ported it over and all was going according to plan. then i got distracted and worked on something else for a while (until yesterday actually). yesterday, i had a free second (i was frustrated with something else i was debugging actually), so i picked up the unit test framework and started porting some existing tests to it. oddly, when i got things compiling and linking, it seemed that the tests would not actually get run.

after some debugging, it turned out that the test was getting added to the registry just fine, but by the time we called the function that ran the tests, the registry was empty. at this point i was confused, so i started playing with the debugger, setting watchpoints on the internals of the registry (the front and back pointers of the underlying vector actually), and it turned out that what was happening was this: the test was being added to the vector before the vector's constructor was called. it turns out that on this platform a vector (after it's constructor is called) has member variables that are all zeroed out, and in this particular case the vector happened to be in the same state before the vector was called. this means that the push_back function that was called to add the test to the registry worked, even though the vector had not been initialized. then, the vector's constructor was called, and the vector's member variables were zeroed out, so it seemed as if nothing was ever added to it.

c++ does not require any specific ordering of global objects within the same library (or executable, they have a fancier word for it). so it's perfectly valid for the two constructors to be called in this 'inconvenient' order. to avoid the problem, you can move one of the objects to a dynamicly linked library. this will allow the system's runtime linker to resolve the problem for you. the first time the macro instantiates it's object, it will try to reference a symbol in the libarary, so they dynamic linker will automatically run the constructor of the registry object as part of the process of linking it in to the running executable.

and no, before you ask (since at least two of you are still reading at this point), i did not figure this out all on my own. i got up to the 'damn, the constructor is happening after the add' point, noticed the constructors were happening in the wrong order, and went to ask for help from one of our resident c++ gurus, who explained how the problem was solved when the unit testing library was first written.