Neil's News

Transposing Diffs

26 June 2007

Diff is notorious for returning results which while technically correct, don't make sense. Take this example:

Text 1: The cat.
Text 2: The cow and the cat.

The sensible diff would be this:

Good Diff: The cow and the cat.

However, the following is also a possibility and may be returned if the diff algorithm feels so inclined:

Bad Diff: The cow and the cat.

It is possible to transform a bad diff into a good diff. The trick is to slide edits (insertions or deletions) sideways if the edit is next to an equality and the whole text of the equality makes up the letters at the opposite end of the edit. To illustrate:

Diff 1: The cow and the cat.
Diff 2: The cow and the cat.
Diff 3: The cow and the cat.

Thus the silly three-edit diff has been reduced down to a completely equivalent single-edit diff.

However, this is still not ideal. To increase the diff's human readability, a further transposition to line the edits up with word boundaries would be preferable.

Diff 4: The cow and the cat.

The first type of post-diff transpositions (Diff 3) have been added to the Diff, Match and Patch libraries. The second type (Diff 4) will follow soon.

I've also made a large number of updates to my Diff Strategies paper. One of the intended changes was to move from hard-coded syntax highlighting of sample code to client-side rendering. Unfortunately the best library I could find was way too buggy and the developer appears to be unresponsive. So I reverted all that work.


Someone Digged my image to html converter, resulting in a panicked shutdown of the script by Digital Routes when the load average spiked. I quickly reprogrammed the script to cache the common case of converting Tux and installed a rate-limiter for non-cached conversions.


While cycling up Shepherd Canyon in Oakland on Sunday, I came across this humourously modified sign. The view from the top was pretty sweet too.

< Previous | Next >

 
-------------------------------------
Legal yada yada: My views do not necessarily represent those of my employer or my goldfish.