by Neil Fraser, August 2009
Collaborative editors have to combine input coming from the local user and from remote users. One of the significant user interface challenges is cursor behaviour. When a remote user's change is merged into the document, what happens to the local user's cursor? Likewise, what happens to a selection which the local user has made?
Failure to preserve the cursor and selection can result in the cursor jumping to the top of the document and selections becoming unselected every time a remote change occurs. This behaviour can make such an editor completely unusable.
First, let's define the related concepts of cursors and selections. A text cursor indicates one point in the document where user keystrokes will be inserted. A selection indicates the highlighted range between two points in the document. Under most (but not all) environments there can be at most one cursor and one selection. Furthermore, the cursor would normally be locked to either the left or the right side of the selection (if there is one). Thus the cursor and selection can be completely described with two points (the selection start and finish) and a Boolean (which end the cursor is located at). If there is no selection, then the start and end points are the same (often described as 'collapsed') and the Boolean is not relevant.
The most common cursor preservation technique is to perform absolute referencing. Under this model the start and finish locations are recorded in terms of character offsets from the document start, then the remote changes are applied while keeping track of deltas on both points. If an insertion is made ahead of one of these points, then the point's location is incremented by the length of the insertion. Likewise if a deletion is made ahead of one of these points, then the point's location is decremented by the length of the deletion. If a deletion is made which encompasses the point, then the point's location is decremented by the distance between it and the start of the deletion. Insertions and deletions made behind a point have no effect on the point's location.
Here is an example of absolute referencing. The cursor is currently at offset 24, just before the word "slithy":
`Twas brillig, and the ^slithy toves
The following edits arrive from a remote user (strike-through represents a deletion, underline represents an insertion):
`Twas brillig,and& the slithy toves
Three characters were deleted and one was added. If there were no deltas made to the cursor offset, the cursor would shift by two characters:
`Twas brillig, & the sl^ithy toves
By subtracting three and adding one to the cursor location, the cursor is moved to the expected location:
`Twas brillig, & the ^slithy toves
Here's another example involving a selection:
`Twas brillig, and the slithy toves
The same two edits arrive:
`Twas brillig,and& the slithy toves
Without any deltas, the six-character selection would slip sideways:
`Twas brillig, & the slithy toves
The start of the selection was inside a deleted chunk, so its location gets decremented by the number of characters between the start of the deletion and the selection start. The following insertion of one character occurs after the selection start thus has no effect:
`Twas brillig, & the slithy toves
Finally the end of the selection is moved back by three and forwards by one, making a total delta of negative two:
`Twas brillig, & the slithy toves
Had the same edits arrived in the opposite order:
`Twas brillig, &andthe slithy toves
Then the final result would have been different, but equally acceptable:
`Twas brillig, & the slithy toves
This technique is relatively simple and produces a reasonably good user experience. Google Wave is an editor which performs absolute referencing.
While good (and depending on the application, potentially "good enough"), absolute referencing is not perfect. Consider the following case:
`Twas brillig, and the slithy toves
Did gyre and ^gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.
If the middle two lines were swapped, the difference might look like this:
`Twas brillig, and the slithy tovesDid gyre and gimble in the wabe:All mimsy were the borogoves, Did gyre and gimble in the wabe: And the mome raths outgrabe.
Given that the original position of the cursor was in the middle of line 2, and that line 2 was completely deleted, then absolute referencing would set the new location of the cursor at the beginning of line 2:
`Twas brillig, and the slithy toves
^All mimsy were the borogoves,
Did gyre and gimble in the wabe:
And the mome raths outgrabe.
However the user would expect their cursor to appear in the middle of line 3.
A potential solution would be to extend the differencing algorithm to include not only insertions and deletions, but also moves. Thus if a block containing a cursor or selection point is moved to a new location, then that point is moved along with it. However this solution fails to address the case of moves with modifications:
`Twas brillig, and the slithy tovesDid gyre and gimble in the wabe:All mimsy were the borogoves, Did Gyre & Gimble in the Wabe: And the mome raths outgrabe.
No known differencing algorithm would interpret this as a move plus some edits.
A completely different approach to restoring cursor and selection locations is to use fuzzy matching. For each point of interest a number of characters before and after the point are recorded, as well as the absolute offset. Once the remote edits are applied, a fuzzy matching algorithm is used to search for the best match for the previously recorded context, weighted around the area where the result is expected. If such a match is found, then the point's new location has been established.
In the above case the prefix might be "yre and " while the suffix might be "gimble i" (in practice the context would be longer, but this example only uses eight characters for brevity). The offset from the start of the document would be 52. After the edits are applied, the matching algorithm start looking for "yre and gimble i" at around position 52. The best match would be "yre & Gimble i" (a Levenshtein distance of 4) at position 80 (an offset distance of 28).
Once a match is obtained, an exact position for the point needs to be determined. The original point was in the middle of the original context string, but where is it located in the newly matched string? To find out, a detailed difference between the two strings is computed:
yreand&gGimble i
Using the difference as framework, we can now map the old cursor location to the new string:
`Twas brillig, and the slithy toves
All mimsy were the borogoves,
Did Gyre & ^Gimble in the Wabe:
And the mome raths outgrabe.
One issue with context matching is that a point at the start of the document will not obtain a contextual prefix. Likewise a point at the end of the document will not obtain a contextual suffix. The result is that the following matching effort will be significantly weaker since there is less context to match against. This is unfortunate since it is extremely common for a cursor to be at the start or end of a document. One solution is to increase the size of the context on the opposite side such that the total amount of context remains constant. Another solution is to generate a short string of random characters (preferably unprintable Unicode) and append it to the beginning and end of the text before acquiring and restoring the points, such that the match algorithm has something to match against.
Context matching relies on a rather complicated fuzzy matching algorithm. Good results have been obtained from the efficient Bitap algorithm, modified to weight results by both the Levenshtein distance and the offset distance. Google Documents is an editor which performs context matching.
While context matching robustly solves the move (with potential modification) case which absolute referencing failed at, context matching has an issue of its own.
Consider the case of repeated strings:
Did gyre and gimble in the wabe:
Did gyre and ^gimble in the wabe:
In this case there are two lines, both identical with the cursor is in the middle of the second line. Next, a remote edit arrives adding a new line to the start:
`Twas brillig, and the slithy toves Did gyre and gimble in the wabe: Did gyre and gimble in the wabe:
The matching algorithm starts looking for "yre and gimble i" at around position 51. And it quickly finds an excellent match with Levenshtein distance of 0 and an offset distance of just 1:
`Twas brillig, and the slithy toves
Did gyre and ^gimble in the wabe:
Did gyre and gimble in the wabe:
The result is that the user's cursor jumped up one line. Had the matching algorithm been given a larger context (e.g. "abe:\n Did gyre and gimble in the wabe:\0") there would have been sufficient information to make the second match yield a higher score despite being further away. However, increasing the length of the context not only reduces the efficiency of the matching algorithm, it also reaches a point where accuracy is reduced too -- such as an inability to detect moves like the one which the absolute referencing algorithm failed at above.
Merging the two algorithms allows each to solve the other's problems. The strategy is conceptually simple:
Injecting the absolute referencing algorithm in the middle of the context matching algorithm corrects the offsets such that matches are targeted to the expected location in the new text rather than the previous location in the old text.
Consider an escalation of the previous case:
Did gyre and gimble in the wabe:
Did gyre and ^gimble in the wabe:
The context recorded as "yre and gimble i" and the offset is 51. If a first line is added and the last line is edited, the difference would be:
`Twas brillig, and the slithy toves Did gyre and gimble in the wabe: DidgGyreand&gGimble in thewWabe:
The absolute referencing algorithm would add 36 to the offset to compensate for the insertion of the first line, would both subtract and add 1 for the case change on the letter 'g', and would subtract 3 and add 1 for the change of 'and'. The result would be an offset of 85.
Next the fuzzy matching locates two potential matches. One is "yre and gimble i" at offset 52 (a Levenshtein distance of 0 and a delta offset of 33). The other is "yre & Gimble i" at offset 85 (a Levenshtein distance of 3 and a delta offset of 0).
Based on how the scoring formula in the matching algorithm is tuned, either answer might be chosen. If the score ranks a small Levenshtein distance over a small offset distance, then the first match would be chosen, otherwise the second match would be chosen. Indeed the expected result in this case is unclear. Thus the tuning of the scoring formula becomes a matter of usability studies.
The combined algorithm is more complex, but significantly more accurate than either algorithm alone. MobWrite is an editor which uses context matching with absolute referenced offsets.
Once the cursor has been properly restored, the next challenge is restoration of the scrollbar. It is not useful for the cursor to be restored to the correct location if the scrollbar snaps the to the top each time an update is made.
Successful scrollbar restoration involves setting the scroll position such that the text which the user is looking at does not move. Left-right movement is unavoidable in a line-wrapped document, but vertical movement is controllable in the event that the document is longer than the widget it is being displayed in. The challenge is to identify what the user is looking at. In the absence of eye-trackers, there are a set of heuristics which provide decent usability:
The horizontal scrollbar (if there is one) ought to be treated in the same manner.