Neil's News

EncodeURI

23 December 2011

[Dart logo] Last week I ported my Diff Match Patch library[?] to Dart[?]. (Although the port is complete, it won't get released until after the holidays due to all my code reviewers being with their families.) One of the major stumbling blocks I ran into was Dart's lack of encodeURI/decodeURI functions to turn text into characters that are safe to use in a URL. Obviously I had to write my own functions. No big deal. How hard can it be to look up the character code for '@', do a hexadecimal conversion and return '%40'? Turns out that the format is much more complicated than this.

Two-digit hex codes only get you to character 127 (the top-most bit must be 0). Beyond this one needs to switch to four and six-digit codes to reach 65k Unicode characters -- all of which must be encoded in UTF-8's bit scheme. Beyond even this, one needs to switch to eight-digit Unicode surrogate pairs to reach an additional one million Unicode characters.

  • %40 → 0x40 → @
  • %DA%80 → 0x680 → ڀ
  • %E3%81%86 → 0x3046 → う
  • %F0%A0%A4%80 → 0xD842,0xDD00 → 𠤀

Here's the resulting code and the unit tests:
  EncodeDecode.dart
  EncodeDecodeTest.dart

After showing my solution to the Dart team they asked me to submit it into their core library so that nobody else has to go through what I went through. Dart is shaping up to be a really great language. Although it's not ready for serious use right now, it is great to be able to help shape a language that's going to be one of the pillars of web programming within a few years.

Update: encodeUri/decodeUri are now included with Dart. Use the Uri class.

< Previous | Next >

 
-------------------------------------
Legal yada yada: My views do not necessarily represent those of my employer or my goldfish.