data portablity | Marcus Povey

Data portability is a bit of a hot topic at the moment, and a recent article in the Economist illustrates that this is becoming seen as an issue outside the technical blogging crowd.

So it seems like a good time for me to blog about one of the funky things I’ve been working on recently, the Open Data Definition (ODD).

So, what is ODD?

ODD is an XML based data exchange format which is designed to be simple to implement and use. It consists of a framework and an extension format defining keywords.

The development of ODD fell out of a need for import/export functionality in the new version of Elgg. Import/export was one of the most requested features for previous Elgg incarnations, but be quickly realised that it could be converted into something that had many more uses.

As covered in the Economist article, Data silos don’t cut it anymore. Users want to be able to move their accounts between social networks and have friends on different networks without having thousands of accounts (an issue we looked at solving in a slightly different way with explode).

When looking for a solution we did look at adapting one or more of the existing data portability solutions, but to say that none of them seemed suitable was somewhat of an understatement. Many seemed to fall foul of a problem common… they are just too damn complicated for widespread adoption!

ODD, as mentioned previously, is XML based. When making it I used Elgg 1’s object model as a guide and reduced things down to their lowest common denominators, therefore we have three main components – Entities, Metadata and Relationships.

These components are atomic, and the format itself has virtually no nesting. This is slightly unconventional, but it makes the format easy to parse, supports partial import/export and makes it easy to extend the format to support the live pinging of updates.

This gives us:

Entity

Entities are “things”, for example a web log post or a user account. The entity has a “class” attribute to specify what type of entity it is and can be subclassed.

All entities are identified by a UUID, this is important and I’ll get on to that later.

Metadata

Metadata provides information about an entity as a name/value pair. Optionally, you can give a type to specify the type of metadata – e.g. attribute or annotation.

Relationship

As the name suggests, a relationship defines the relationship between two entities. To do this they use a “verb” (as defined in the extension format mentioned above). Doing it this way permits setting and un-setting operations – for example, friend & unfriend, join & leave.

The UUID

An important concept in all this is the UUID.

The UUID is a URL which must point to an ODD representation of the thing it represents. I think this is quite a powerful concept since it permits truly distributed networks to be build.

An example

To give you an idea of how this might look, here’s an example ODD document.

<odd> <header version=”1.0” extension=”SN:1.0” generated=”....” />

<entity uuid=”http://foo.com/export/34/” class=”object” subclass=”blog” />

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/45” name=”owner”>http://foo.com/export/24</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/49” name=”content”>I saw Sindy at the mall today, she thinks she's all that, but she's not all that... I'm going to cry now and listen to Emo music.</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/4” name=”tag”>angst</metadata>

<metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/5” name=”tag”>emo</metadata>
</odd>

Pretty simple, I hope you’ll agree!

The only extra thing to note is the header element, which simply gives some version information about the framework and extension being used.

I will be giving a brief presentation about this at the Oxford Geek Night on the 22nd and will be answering questions after the event (and again in San-francisco on May 7th), feel free to come along!

In the meantime, have a look at http://www.opendd.net

Marcus Povey

Time, Space, and Plexiglas

Tag Archives: data portablity

Introducing the Open Data Definition