This article got me pondering on how one might start building a distributed “related article” network, but without relying on a centralised silo.

Related articles on the same site is largely a solved problem, but at the moment, to do the similar thing with multiple sites requires a centralised service. Centralisation is bad, as we’ve discussed before, so how could you build a federated network of sites, all referring people between each other in an automated but meaningful way?

My current thinking is to leverage PuSH; Alice lists sites to which they’d like to receive related articles from, these could be individual sites or even a centralised aggregator. Alice’s site then subscribes to the PuSH hub and starts receiving updates, when these updates are received they can be passed through to whatever comparison algorithm you’re using – I’m thinking of adapting the wordpress one for this blog.

Should be fairly straightforward to implement, and would provide a simple way to federate content within a group of individuals.

Anyone working on something like this, or shall I drop this into my todo list?

This post, requested by Ben Werdmuller, pulls together a number of earlier posts in order to better document the federated, cross platform, friend/follow and signon mechanism stuff I’ve been hacking on recently. It’ll summarise the posts together with my latest thoughts, although I do encourage you to read the originals as well, since there’s a fair amount of detail there.

Federated/distributed social networking is something I (and many other people) have been kicking around for a little while. When working on Elgg, I was involved in a bunch of conversations where we explored getting the various Elgg sites to talk to each other, but it never really got anywhere at the time.

Times move on, and now I think we have a chance to really get somewhere; kicking about the Known code has given me a nice experimental platform to play with and there are now some distributed social tools and protocols that are seeing wide adoption (PuSH, MF2 etc), which is going to be very helpful.

Post Snowden of course, it is now clear that target dispersal, combined with widespread encryption, is required to keep our private lives safe from being spied on. Getting our everyday social interactions out of a centralised data-mining facility is now a basic requirement to safeguard our essential liberties.

Initial requirements

Going into this then, I wanted to start building the parts of a distributed social network, and I wanted to set some loose guidelines of what I’d like to see.

  • Distributed: There should be no central server anywhere in the ecosystem. Ideally transactions should occur peer to peer between nodes, rather than be orchestrated by a central body.
  • Cross platform: I don’t want to mandate the use of one specific platform. You can’t call yourself a distributed/federated social network if you can only federate between nodes running the same software! That’s a monoculture, and we know those are bad.
  • Simple, open, protocols: I don’t want to spend days building this, and if necessary I want to be able to test using the command line and CURL.
  • URLs, from a UX standpoint, are a bad way to identify people (lessons learnt from OpenID). I may need to reference user profiles by URL, but every time you force someone to type one in, God kills a kitten.

Friending and profile discovery

Original posts here and here.

The first step towards building a social network of any kind is to have the ability to add your friends to your network, and a distributed network is no different.

Here, and in my reference implementations on Known and Elgg, I am adopting the uni-directional “follow” idea of friendship (like Twitter follow) rather than the omni-directional transactional Facebook model, since this was the minimum I needed to make this work, and in my mind at least, better fitted how “friending” works in the real world.

So then, friending works by having an endpoint on your site which is passed the URL you want to add as a friend. To make this easy, and to avoid typing URLs, both my reference Known and Elgg implementations contain a bookmarklet which you can add to your browser button bar.

Alice visits Bob’s website or profile and clicks on the button. Alice’s site then retrieves Bob’s site and parses it for whatever user details can be found on the page – looking for name, profile picture and the URL of their profile. This is made possible through the use of Microformats, especially MF2.

Microformats are simple bits of markup that are invisible to someone who just looks at a webpage, but which allows a computer to understand the meaning of things on a page, for example, to understand that a certain bit of text is a person’s name, or that one link is a link to their profile picture and another link is their profile url. Additionally, since this is just text on a page, there is no requirement for that page to be “special” in any way, i.e. it could just be a static page, there is no requirement for special headers or the page to be the output of a script.

Here is an example of how a user may be marked up:


This markup can then be easily processed using one of the many libraries out there; if you’re using PHP I highly recommend Barnaby Walters PHP-MF2 library. In the above example I create a block, that I say identifies as a person (h-card), then details their photo, full name, email address and a url relating to them. This is probably enough information to be getting on with, but you can of course extract more profile/user information, if the markup is there.

Since a given page may contain multiple marked up people (especially if Alice clicks the “add friend” button while on a news feed), my reference implementations present a list of users which may be added, after first removing any duplicates (based on the URL of their profile), and you are also given the opportunity to fill in or amend any scraped information. If more than one URL is given for an entry, you should reconcile this by some mechanism in some way – I just render this as a dropdown in order to give Alice the choice of Bob’s primary profile, but I’m sure there’s a cleverer way.

Once Alice is happy, she can add Bob as a friend, and her site can do any post friending stuff – subscribing to Bob’s PuSH endpoint (if one is specified), or generating access credentials for Bob.

So, in summary, distributed friending works like this:

  1. Alice sends Bob’s page URL to her magic friending endpoint (using a browser bookmarklet)
  2. Alice’s site examines the URL for MF2 marked up h-card entries
  3. Alice is presented with a unique list of h-card entries (where uniqueness is defined on normalised profile URLs).
  4. Alice adds Bob as a friend and triggers any post friend events

Listening to Bob

After Alice adds Bob as a friend, she wants to be told when Bob updates his site. In Known this is accomplished by performing a Pubsubhubbub discovery and subscribe when the “friend” event is triggered (step 4 above).

I won’t go into too much detail as to how a PuSH subscription handshake works, since there’s more complete implementation information in the spec, but in summary, when Alice successfully adds Bob as a friend, her site does the following:

  1. Alice’s site looks for a feed URL on Bob’s site.
  2. Her site retrieves that url and looks in it for a “self” link (which is the canonical permalink for the feed of Bob’s updates).
  3. Then her site looks at this URL again and looks for any declared PuSH hubs to which to subscribe.
  4. If found, her site places a marker that she is subscribing to this hub in memory, then makes a subscription request.
  5. Bob’s hub at some point later will ping Alice’s PuSH endpoint with a success or failure message.
  6. Alice’s PuSH endpoint matches this request with the list of requests she’s made, and if the security tokens match up she can say she is subscribed.

Once subscribed, Alice’s endpoint will be pinged by Bob’s hub every time he makes an update. Alice’s site can then decide what to do with that information; perhaps Alice can use it to maintain a news feed, or send out an email update, whatever.

Friend only/private posts & friend signon

Original posts here and here, here and finally here.

So far, all we’ve really done is create a fancy RSS reader. The next step in creating a truly distributed social network is to have the ability to create posts which only your friends (or a selected subset of your friends) can see, but that the wider internet can not.

On centralised social networks this is trivial, since all users are local and can be identified in one of the many time honoured and straight forward ways, and once identified, content that they’re not permitted access to can be easily hidden. On a distributed social network, this becomes much more difficult.

Fundamentally, it’s a problem of credential exchange.

There are many techniques you could deploy to solve this problem, and most of them are not mutually exclusive. One approach might for Alice’s site to generate a random password and email it to Bob (since we likely have Bob’s email address from his h-card). Personally, I don’t think this is terribly clean.

So, I humbly put forward my thoughts on using OpenPGP keys as an identity mechanism

OpenPGP signin

My spec for this can be found in these two posts, but in short it works as follows:

  • Bob generates / adds a pgp key pair to his profile, and publicises his public key in one or more of the following ways *(discussion: Bob’s site needs access to a private key in order to generate signatures, therefore this key material should be kept secure. It may be that it’s best to generate a new keypair for exclusive use by Bob’s site, but I do kind of like tying together Bob’s profile and Bob’s email and identifying both cryptographically with the same key)*
    1. Via a HTTP Link header, with a rel of “key”, e.g. Link: https://example.com/bob/pubkey.asc; rel="key"
    2. Via a META tag in the HTTP header, e.g. <meta href="https://example.com/bob/pubkey.asc" />
    3. Via an anchor tag within the page body of rel=”key”, e.g. <a href="https://example.com/bob/pubkey.asc" rel="key">My Key</a>
    4. By pasting the key into the body of the page, and giving it a class of “key”, e.g.

      <pre class="key">
      -----BEGIN PGP PUBLIC KEY BLOCK-----
      ....
      -----END PGP PUBLIC KEY BLOCK-----
      </pre>

  • When Alice successfully adds Bob as a friend, her site attempts to extract the public key from his page. If found, her site saves the public key against Bob’s newly created user.

Now, some time later Alice creates a post, and she only wants Bob to be able to see it, so she…

  • Creates a new post, and adds Bob’s user to the ACL.
  • Bob’s site is notified by PuSH that Alice’s site has been updated, if Bob has also added Alice as a friend (because it’s a private post, we don’t push content, although conceivably we could encrypt the content with the public key of Bob, and whoever else has access. This bit is a little out of scope at the moment)
  • Bob visits Alice’s site and identifies himself by clicking on a bookmarklet. This bookmarklet passes the URL of Alice’s site back to Bob’s site which produces a signed request and sends it back.
  • Alice’s site verifies that the signature is valid, and that it was signed by the key belonging to Bob.

The signature sent by Bob’s site is formed over a message containing:

  1. The current date and time in ISO8601 format, as produced by date('c', time()); in PHP
  2. Bob’s profile URL
  3. The URL of the resource on Alice’s site that Bob is requesting.

Alice’s site should, on receiving this:

  1. Verify the signature is valid and the contents unmodified.
  2. Verify that it was signed by Bob’s key.
  3. Verify that the resource being requested is on Alice’s site.
  4. Check the timestamp is valid and within an acceptable range from now.
  5. Store the timestamp + profile url + resource url together and use it as a nonce to guard against replay attacks.
  6. Check that we’ve not seen this request before by querying the nonce generated in step 5 against the nonce store.

If all the above passes, Alice’s site lets Bob access the restricted resource (and optionally, logs Bob in to the site, allowing him access to any other resources he has access to).

Moving forward

So far I’ve demonstrated this working in a small distributed social network comprised of Known users, Elgg users and WordPress users, as well as PGP signon from the same plus shell scripts and javascript.

Nothing here requires anything particularly special to get up and running, and I’m hopeful that after all this has been revved a few times it’ll be pretty robust.

I’d be interested in your thoughts!

Yesterday, I wrote a post outlining a draft specification for a possible way to handle login on a distributed social network, together with a reference implementation for Known.

I got some really positive feedback, including someone pointing out a potential replay vulnerability with the protocol as it stands.

I admit I had overlooked replay as an attack vector (oops!), but since peer review is exactly why open standards are more secure than propriatory standards, I thought I’d kick off the discussion now!

The Replay problem

Alice wants to see something that Bob has written, so logs in according to the protocol, however Eve is listening to the exchange and records the login. She then, later, sends the same data back to Bob. Bob sees the signature, sees that it is valid, and then logs Eve in as Alice.

Worse, Eve could send the same packet of data to Clare and David’s site as well, all without needing access to Alice’s key.

Eve needs to be able to intercept Alice’s login session, which, if HTTPS has been deployed is largely impractical, but since this can’t always be counted on I’d like to improve the protocol.

Countermeasures

Largely, countermeasures to a replay attack take the form of creating the signature over something non-repeatable and algorithmically verifiable that Alice can generate and Bob can check.

This may be some sort of algorithmically generated hash, a timestamp, or even just a random number, or record whether we have seen a specific signature before.

My specific implementation has an additional wrinkle in that it has to function over a distributed network, in which each node doesn’t necessarily talk to each other (so we can’t check whether we’ve seen a signature or random number before, since Bob might have seen it, but Clare and David won’t have).

I also want to avoid adding too much complexity, so I’d like to avoid, if I can, doing some sort of multi-stage handshaking; for example hitting an endpoint on the server to obtain a random session id, then signing that and sending it back. Basically, I’d still like to be able to talk to a server using Unix command line tools (gpg) and CURL if I can!

Proposed revision

Currently, when Alice logs in to Bob’s site, Alice signs their profile URL using her key and sends it to Bob. Bob then uses this profile url to verify that Alice is someone with access to Bob’s site/post and then users the signature to verify that it is indeed Alice who’s attempting to log in.

What I propose, is that in addition to forming the signature over Alice’s profile URL, she also forms it over the URL of the page she is trying to see, and also the current time in GMT.

Including the requested URL in the signature allows Bob to verify that the request is for access on his site. If Eve sent this packet to Clare or Dave, it could be easily discarded as invalid.

Adding the timestamp allows Bob to check that this isn’t an old packet being replayed back. Since any implementation should have a small tolerance (perhaps a few minutes either side) to allow for clock drift, using a timestamp allows a small window of attack where Eve could replay the login. To counter this, Bob’s implementation should remember, for a short while, timestamps received for Alice and if the same one is seen twice invalidate all of Alice’s sessions.

  • Why invalidate all of Alice’s sessions when we see the same timestamp twice, can’t we just assume that the second packet is Eve?”

    Sadly not – sophisticated attackers are able to attack from a position physically close to you, so Eve’s login may be received first. In the situation where two identical login requests are received, it is probably safer to treat both as invalid.

    Perhaps a sophisticated implementation could delay Alice’s first login for a few seconds (after verifying) to see if any duplicates are received, and only proceed if there are none. This would limit the need to permanently store timestamps against a user’s account, but may be more complex from an implementation point of view.

  • Why use a timestamp rather than a random number?

    I was going back and forth on this… a random number (nonce) would remove the vulnerability window, but it would require Bob’s site to store every number we’ve seen thus far, so I finally opted not to take this approach.

I’d be interested in your thoughts, so please, leave a comment!