Last night I had the privilege of giving a short talk at the Oxford Geek Night about the Open Data Definition.

Its been a while since I have done anything remotely like public speaking so I was rather glad that the event had plenty of “Free as in Beer” beer.

Those of you who are interested, I have uploaded a PDF version of my slides here.

ODD has generated a fair amount of comment. One commenter suggested that we were re-inventing the wheel somewhat.

Maybe that is partially true – there are other data portability formats available, RDF for example or SIOC (apparently pronounced “Shock”, although I’m not entirely sure how). The point we are making with ODD is that powerful as many of these formats are, they are just too complicated and in many cases ambiguous, and for those reasons are not going to see widespread adoption.

Tracked vehicles are very powerful and versatile, but sometimes you just need a bike.

RSS is a good example of what I’m talking about. RSS is nice and simple, and as a result has seen widespread industry adoption. Crucially too there are many applications that consume RSS as well as just produce it, which is something not many other formats can boast.

Our view is that while many of these formats are academically brilliant and conceptually very clever, but they are just too complicated.

Data portability is a bit of a hot topic at the moment, and a recent article in the Economist illustrates that this is becoming seen as an issue outside the technical blogging crowd.

So it seems like a good time for me to blog about one of the funky things I’ve been working on recently, the Open Data Definition (ODD).

So, what is ODD?

ODD is an XML based data exchange format which is designed to be simple to implement and use. It consists of a framework and an extension format defining keywords.

The development of ODD fell out of a need for import/export functionality in the new version of Elgg. Import/export was one of the most requested features for previous Elgg incarnations, but be quickly realised that it could be converted into something that had many more uses.

As covered in the Economist article, Data silos don’t cut it anymore. Users want to be able to move their accounts between social networks and have friends on different networks without having thousands of accounts (an issue we looked at solving in a slightly different way with explode).

When looking for a solution we did look at adapting one or more of the existing data portability solutions, but to say that none of them seemed suitable was somewhat of an understatement. Many seemed to fall foul of a problem common… they are just too damn complicated for widespread adoption!

ODD, as mentioned previously, is XML based. When making it I used Elgg 1’s object model as a guide and reduced things down to their lowest common denominators, therefore we have three main components – Entities, Metadata and Relationships.

These components are atomic, and the format itself has virtually no nesting. This is slightly unconventional, but it makes the format easy to parse, supports partial import/export and makes it easy to extend the format to support the live pinging of updates.

This gives us:

Entity

Entities are “things”, for example a web log post or a user account. The entity has a “class” attribute to specify what type of entity it is and can be subclassed.

All entities are identified by a UUID, this is important and I’ll get on to that later.

Metadata

Metadata provides information about an entity as a name/value pair. Optionally, you can give a type to specify the type of metadata – e.g. attribute or annotation.

Relationship

As the name suggests, a relationship defines the relationship between two entities. To do this they use a “verb” (as defined in the extension format mentioned above). Doing it this way permits setting and un-setting operations – for example, friend & unfriend, join & leave.

The UUID

An important concept in all this is the UUID.

The UUID is a URL which must point to an ODD representation of the thing it represents. I think this is quite a powerful concept since it permits truly distributed networks to be build.

An example

To give you an idea of how this might look, here’s an example ODD document.

<odd>
<header version=”1.0” extension=”SN:1.0” generated=”....” />

<entity uuid=”http://foo.com/export/34/” class=”object” subclass=”blog” />

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/45” name=”owner”>http://foo.com/export/24</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/49” name=”content”>I saw Sindy at the mall today, she thinks she's all that, but she's not all that... I'm going to cry now and listen to Emo music.</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/4” name=”tag”>angst</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/5” name=”tag”>emo</metadata>
</odd>

Pretty simple, I hope you’ll agree!

The only extra thing to note is the header element, which simply gives some version information about the framework and extension being used.

I will be giving a brief presentation about this at the Oxford Geek Night on the 22nd and will be answering questions after the event (and again in San-francisco on May 7th), feel free to come along!

In the meantime, have a look at http://www.opendd.net