23 December 2011
Last week I ported my Diff Match Patch library[?] to Dart[?]. (Although the port is complete, it won't get released until after the holidays due to all my code reviewers being with their families.) One of the major stumbling blocks I ran into was Dart's lack of encodeURI/decodeURI functions to turn text into characters that are safe to use in a URL. Obviously I had to write my own functions. No big deal. How hard can it be to look up the character code for '@', do a hexadecimal conversion and return '%40'? Turns out that the format is much more complicated than this.
Two-digit hex codes only get you to character 127 (the top-most bit must be 0). Beyond this one needs to switch to four and six-digit codes to reach 65k Unicode characters -- all of which must be encoded in UTF-8's bit scheme. Beyond even this, one needs to switch to eight-digit Unicode surrogate pairs to reach an additional one million Unicode characters.
After showing my solution to the Dart team they asked me to submit it into their core library so that nobody else has to go through what I went through. Dart is shaping up to be a really great language. Although it's not ready for serious use right now, it is great to be able to help shape a language that's going to be one of the pillars of web programming within a few years.
Update: encodeUri/decodeUri are now included with Dart. Use the Uri class.