7 January 2007
Yesterday I was walking to work and stopped at a red light. I pushed the pedestrian crosswalk button, waited for the signal, then proceeded to walk across the five-lane street. Less than ten seconds later a wall of traffic was bearing down on me and I had to sprint to get out of the way. Why was I nearly run over?
The answer stems from an incident in Illinois in 1995. In the town of Fox River Grove some railroad tracks crossed a road very close to an intersection. Whenever a train approached, the traffic lights would receive an override signal to give a green light to cars which might otherwise be stopped on the rails. Three seconds were allotted to giving opposing traffic their normal yellow, followed by a normal two seconds of all-red, followed by 15 seconds for any blocked cars to clear the tracks. Then the train would pass through. This was a reasonable timing sequence.
But every once in a while people would report that they were only getting around two seconds to clear the tracks. Traffic engineers and railroad engineers repeatedly looked at the system, but found nothing wrong. It was not repeatable. Test after test showed everything was in perfect working order.
Then the bug reared its head while a school bus was trapped on the tracks. Seven children died when when the train crashed into it. Belatedly the engineers discovered that if the pedestrian crosswalk was activated, the normal three seconds of yellow was replaced by a normal fifteen second pedestrian warning cycle. Leaving only a couple of seconds to clear the tracks. This accident has become a classic engineering case study on how a every unit test can pass, but the system as a whole can fail.
So how did they fix the problem? As I discovered yesterday, they installed an override on the offending pedestrian crossing signal. Which means one can get a pedestrian walk signal, then five seconds later traffic can be right on top of you. I guess it seemed like an improvement: instead of killing a bus-load of people, you only kill the odd pedestrian. Half a minute after this incident, the Caltrain which triggered it all roared by. On the way home from work, exactly the same scenario reoccurred. It is completely repeatable. The only difference was that the second time I was prepared for it and started running when I saw the sequence start.
The lesson here is that discovering a bug is one thing. Fixing it properly is quite another. A fix which merely transfers the bug to another party is not a fix. That's called Whac-A-Mole.
Update: I reported the issue to the City of Mountain View. They are looking into it.