|
Moore's Bet 117 December 2010 In June 2009 I made a bet with my friend Tancred that over the next 15 years I could improve the performance of my diff library by a factor of 1000. That means doubling performance every 18 months. It has been 18 months, so let's see how we've done in the first iteration. Below is a graph of diff speeds for each language, with tests run on the 15 June 2009 codebase (blue) and the 16 December 2010 codebase (orange). The two compiled languages (C# and Java) lead the pack with speed improvements in excess of twenty times. The interpreted languages (JavaScript and Python) improved by three or four times. It is important to note that both sets of tests were run on the same hardware/software stack, so the actual speed increase is even more significant when one factors in processor upgrades, smarter compilers and better browsers. Small print: Lua is a new addition, so there's no data for 2009. C++ does not currently have a speed testing harness, so there's no data. C# was compiled with Mono. Python is version 2.6. All tests were run on the same computer on the same day, so Moore's Law is not reflected in this data. The diff algorithm is definitely on the right track. Over the coming 18 months I'm intending to make it multi-threaded. Thus on my current quad-core hardware all versions would receive a four-times speedup. The bad news is that despite being ahead of the curve I'm probably not going to win the pizza. A closer examination of the bet reveals it didn't specify a 1000x speedup, it specified a 1000x increase in text length. Since diff is an O(nd) operation, the relationship is far from linear. As a result of this round of improvements I'm now able for the first time to successfully complete the prescribed 200kb diff without running out of memory. It currently takes 48 minutes -- the goal is 1 millisecond. This means to win the bet I need to increase the speed by a factor of five every iteration. That's not something I think is doable. But I'll try. Sunset in Golden Gate Park:
|