• Home
  • Consultancy
  • Contact
  • ODD and Import / Export

    April 30th, 2008 by Marcus Povey

    I have just uploaded the latest draft of the ODD specification to OpenDD.net, so pop over and take a look.

    Since the last release of the draft I’ve done a fair amount of work to simplify the format even further; simplifying terminology, clearing up some inconsistencies and dropping namespaces altogether.

    You’ll notice that we still don’t define any terms. As Ben touched on in a recent post, we decided to not confuse the format by trying to tie it to any one application, while keeping it as easy as possibly to actually use. I’ll cover this in more detail a bit later…

    So, lets talk about how I’m using ODD to implement full data import and export in the upcoming release of ElggElgg 1.

    For those who don’t already know, Elgg is an Open source social networking application engine. The previous version has been downloaded over 100K times, and Import and export was one of the most frequently requested enhancements.

    Export

    Export was a fairly trivial matter. The new version of Elgg employs a flexible event system, so all I had to do was trigger an “export” event.

    This event is passed a GUID – an identifier identifying the thing you are exporting, and elements of the system (and thirdparty plugins) can listen for this event and react accordingly.

    The event is essentially asking all parts of the Elgg application – core and plugins – “Tell me all you know about X”. The export listens to the answers and converts it into an ODD document that looks something like this:

    <odd version="1.0" generated="Wed, 30 Apr 2008 22:21:55 +0100">


    <entity uuid="http://example.com/odd/78/" class="object" subclass="blog" published="Fri, 18 Apr 2008 11:45:50 +0100" />


    <metadata uuid="http://example.com/odd/78/attr/owner_uuid/" entity_uuid="http://example.com/odd/78/" name="owner_uuid" published="Fri, 18 Apr 2008 11:45:50 +0100" >http://example.com/odd/77/</metadata>


    <metadata uuid="http://example.com/odd/78/attr/title/" entity_uuid="http://example.com/odd/78/" name="title" published="Fri, 18 Apr 2008 11:45:50 +0100" >test</metadata>


    <metadata uuid="http://example.com/odd/78/attr/description/" entity_uuid="http://example.com/odd/78/" name="post" published="Fri, 18 Apr 2008 11:45:50 +0100" >First post</metadata>


    <metadata uuid="http://example.com/odd/78/metadata/35/" entity_uuid="http://hexample.com/odd/78/" name="tags" type="metadata" owner_uuid="http://example.com/odd/77/" published="Fri, 18 Apr 2008 11:45:50 +0100" >wibble</metadata>


    </odd>

    Here we see an entity (in this case a blog post), and some details about it (the metadata).

    Import

    Import is traditionally the more complicated part of the equation. ODD is trivial to parse, each tag is atomic and represents exactly one thing, this is a big advantage from the point of view of anyone implementing a reader for it since it makes the whole thing pretty much stateless.

    ODD tags arrive, whether as a file to import or as a live feed, and an event is triggered. This event passes around the tag and essentially asks the question “Does anyone know how to handle this?”.

    The stateless nature of ODD of course meaning that you don’t have to process the entire file, making it a trivial matter to implement a reader using a SAX parser.

    That just about covers it, I’ll be posting some example code in a few days (workload permitting) so hopefully people can start getting stuck in. If you want to get involved in development, please head over to the ODD group.

    A final note: I will be in San-Francisco all next week, so if you are in the bay area and feel like having a chat about ODD or Elgg, then please get in touch!

    ODD ZDNet article

    April 29th, 2008 by Marcus Povey

    Just a quick heads up, Ben has posted an ODD article over at ZDNet which is well worth a read.

    Ben discusses ODD and the other data portability formats in the area and explains where ODD fits in.

    Enjoy!

    ODD @ OGN

    April 23rd, 2008 by Marcus Povey

    Last night I had the privilege of giving a short talk at the Oxford Geek Night about the Open Data Definition.

    Its been a while since I have done anything remotely like public speaking so I was rather glad that the event had plenty of “Free as in Beer” beer.

    Those of you who are interested, I have uploaded a PDF version of my slides here.

    ODD has generated a fair amount of comment. One commenter suggested that we were re-inventing the wheel somewhat.

    Maybe that is partially true – there are other data portability formats available, RDF for example or SIOC (apparently pronounced “Shock”, although I’m not entirely sure how). The point we are making with ODD is that powerful as many of these formats are, they are just too complicated and in many cases ambiguous, and for those reasons are not going to see widespread adoption.

    Tracked vehicles are very powerful and versatile, but sometimes you just need a bike.

    RSS is a good example of what I’m talking about. RSS is nice and simple, and as a result has seen widespread industry adoption. Crucially too there are many applications that consume RSS as well as just produce it, which is something not many other formats can boast.

    Our view is that while many of these formats are academically brilliant and conceptually very clever, but they are just too complicated.

    Open Data Definition Website is live!

    April 18th, 2008 by Marcus Povey

    Just a quick note to say that the official Open Data Definition website has gone live!

    Ben informed me of this just as I read this comment on an old interview he did on the subject of data portability.

    There’s your answer clappingtree, albeit a little late!

    Introducing the Open Data Definition

    April 16th, 2008 by Marcus Povey

    Data portability is a bit of a hot topic at the moment, and a recent article in the Economist illustrates that this is becoming seen as an issue outside the technical blogging crowd.

    So it seems like a good time for me to blog about one of the funky things I’ve been working on recently, the Open Data Definition (ODD).

    So, what is ODD?

    ODD is an XML based data exchange format which is designed to be simple to implement and use. It consists of a framework and an extension format defining keywords.

    The development of ODD fell out of a need for import/export functionality in the new version of Elgg. Import/export was one of the most requested features for previous Elgg incarnations, but be quickly realised that it could be converted into something that had many more uses.

    As covered in the Economist article, Data silos don’t cut it anymore. Users want to be able to move their accounts between social networks and have friends on different networks without having thousands of accounts (an issue we looked at solving in a slightly different way with explode).

    When looking for a solution we did look at adapting one or more of the existing data portability solutions, but to say that none of them seemed suitable was somewhat of an understatement. Many seemed to fall foul of a problem common… they are just too damn complicated for widespread adoption!

    ODD, as mentioned previously, is XML based. When making it I used Elgg 1’s object model as a guide and reduced things down to their lowest common denominators, therefore we have three main components – Entities, Metadata and Relationships.

    These components are atomic, and the format itself has virtually no nesting. This is slightly unconventional, but it makes the format easy to parse, supports partial import/export and makes it easy to extend the format to support the live pinging of updates.

    This gives us:

    Entity

    Entities are “things”, for example a web log post or a user account. The entity has a “class” attribute to specify what type of entity it is and can be subclassed.

    All entities are identified by a UUID, this is important and I’ll get on to that later.

    Metadata

    Metadata provides information about an entity as a name/value pair. Optionally, you can give a type to specify the type of metadata – e.g. attribute or annotation.

    Relationship

    As the name suggests, a relationship defines the relationship between two entities. To do this they use a “verb” (as defined in the extension format mentioned above). Doing it this way permits setting and un-setting operations – for example, friend & unfriend, join & leave.

    The UUID

    An important concept in all this is the UUID.

    The UUID is a URL which must point to an ODD representation of the thing it represents. I think this is quite a powerful concept since it permits truly distributed networks to be build.

    An example

    To give you an idea of how this might look, here’s an example ODD document.

    <odd>
    <header version=”1.0” extension=”SN:1.0” generated=”....” />

    <entity uuid=”http://foo.com/export/34/” class=”object” subclass=”blog” />

    <metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/45” name=”owner”>http://foo.com/export/24</metadata>

    <metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/49” name=”content”>I saw Sindy at the mall today, she thinks she's all that, but she's not all that... I'm going to cry now and listen to Emo music.</metadata>

    <metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/4” name=”tag”>angst</metadata>

    <metadata entity_uuid=”http://foo.com/export/34/” uuid=”http://foo.com/export/34/metadata/5” name=”tag”>emo</metadata>
    </odd>

    Pretty simple, I hope you’ll agree!

    The only extra thing to note is the header element, which simply gives some version information about the framework and extension being used.

    I will be giving a brief presentation about this at the Oxford Geek Night on the 22nd and will be answering questions after the event (and again in San-francisco on May 7th), feel free to come along!

    In the meantime, have a look at http://www.opendd.net

    Next Page »
    All content is © Copyright Marcus Povey 2008-2009 unless otherwise stated.