Neil's News

Python JSON

8 December 2009

JSON is a great data interchange format, it complements XML quite well. But whereas there exists an entire industry devoted to XML parsers, encoders, validators, transformers and activists, JSON does not get as much attention. [As evidence, despite this post being about JSON, check out what's being advertised on the right-hand column.] My requirements called for me to merge JSON blocks from several untrusted sources and publish them as a single JSON block. At issue is that if one source provides syntactically invalid JSON, then the merged block will be unusable by everyone. Thus a JSON validator was needed.

Python has a standard library called (big surprise) json. It offers two methods, read() and write() which convert JSON to Python and Python to JSON respectively. Passing illegal JSON (such as a missing bracket) throws an error:

>>> import json
>>> json.read('[1, 2')
StopIteration
This is good, but insufficient as a validator. Consider the following case:
>>> import json
>>> json.read('123xx')
123
When passed to JavaScript's eval() function, '123xx' will throw a SyntaxError.

A more aggressive option is to round-trip the JSON to Python and back. Thus:

>>> import json
>>> json.write(json.read('123xx'))
'123'
Not only is this approach excessively CPU-intensive, but it introduces new errors. Such as:
>>> import json
>>> json.write(json.read('0.1234567'))
'0.123457'
Not to mention that Python's JSON schema does not follow the spec laid out in RFC 4627. Amongst other things, the top-level JSON object is supposed to be either an array or an object.

Thus I found myself writing a JSON validator from scratch. Efficiency and portability were key considerations.


Obviously I had no choice but to take this photograph.

< Previous | Next >

 
-------------------------------------
Legal yada yada: My views do not necessarily represent those of my employer or my goldfish.