26 June 2007
Diff is notorious for returning results which while technically correct, don't make sense. Take this example:
The sensible diff would be this:
However, the following is also a possibility and may be returned if the diff algorithm feels so inclined:
It is possible to transform a bad diff into a good diff. The trick is to slide edits (insertions or deletions) sideways if the edit is next to an equality and the whole text of the equality makes up the letters at the opposite end of the edit. To illustrate:
Thus the silly three-edit diff has been reduced down to a completely equivalent single-edit diff.
However, this is still not ideal. To increase the diff's human readability, a further transposition to line the edits up with word boundaries would be preferable.
The first type of post-diff transpositions (Diff 3) have been added to the Diff, Match and Patch libraries. The second type (Diff 4) will follow soon.
I've also made a large number of updates to my Diff Strategies paper. One of the intended changes was to move from hard-coded syntax highlighting of sample code to client-side rendering. Unfortunately the best library I could find was way too buggy and the developer appears to be unresponsive. So I reverted all that work.
Someone Digged my image to html converter, resulting in a panicked shutdown of the script by Digital Routes when the load average spiked. I quickly reprogrammed the script to cache the common case of converting Tux and installed a rate-limiter for non-cached conversions.