Summary: Building a distributed social network

A summary post pulling together the documentation for distributed friending, friend discovery and PGP signon protocols.

July 10, 2014 Marcus Povey

This post, requested by Ben Werdmuller, pulls together a number of earlier posts in order to better document the federated, cross platform, friend/follow and signon mechanism stuff I’ve been hacking on recently. It’ll summarise the posts together with my latest thoughts, although I do encourage you to read the originals as well, since there’s a fair amount of detail there.

Federated/distributed social networking is something I (and many other people) have been kicking around for a little while. When working on Elgg, I was involved in a bunch of conversations where we explored getting the various Elgg sites to talk to each other, but it never really got anywhere at the time.

Times move on, and now I think we have a chance to really get somewhere; kicking about the Known code has given me a nice experimental platform to play with and there are now some distributed social tools and protocols that are seeing wide adoption (PuSH, MF2 etc), which is going to be very helpful.

Post Snowden of course, it is now clear that target dispersal, combined with widespread encryption, is required to keep our private lives safe from being spied on. Getting our everyday social interactions out of a centralised data-mining facility is now a basic requirement to safeguard our essential liberties.

Initial requirements

Going into this then, I wanted to start building the parts of a distributed social network, and I wanted to set some loose guidelines of what I’d like to see.

Distributed: There should be no central server anywhere in the ecosystem. Ideally transactions should occur peer to peer between nodes, rather than be orchestrated by a central body.
Cross platform: I don’t want to mandate the use of one specific platform. You can’t call yourself a distributed/federated social network if you can only federate between nodes running the same software! That’s a monoculture, and we know those are bad.
Simple, open, protocols: I don’t want to spend days building this, and if necessary I want to be able to test using the command line and CURL.
URLs, from a UX standpoint, are a bad way to identify people (lessons learnt from OpenID). I may need to reference user profiles by URL, but every time you force someone to type one in, God kills a kitten.

Friending and profile discovery

Original posts here and here.

The first step towards building a social network of any kind is to have the ability to add your friends to your network, and a distributed network is no different.

Here, and in my reference implementations on Known and Elgg, I am adopting the uni-directional “follow” idea of friendship (like Twitter follow) rather than the omni-directional transactional Facebook model, since this was the minimum I needed to make this work, and in my mind at least, better fitted how “friending” works in the real world.

So then, friending works by having an endpoint on your site which is passed the URL you want to add as a friend. To make this easy, and to avoid typing URLs, both my reference Known and Elgg implementations contain a bookmarklet which you can add to your browser button bar.

Alice visits Bob’s website or profile and clicks on the button. Alice’s site then retrieves Bob’s site and parses it for whatever user details can be found on the page – looking for name, profile picture and the URL of their profile. This is made possible through the use of Microformats, especially MF2.

Microformats are simple bits of markup that are invisible to someone who just looks at a webpage, but which allows a computer to understand the meaning of things on a page, for example, to understand that a certain bit of text is a person’s name, or that one link is a link to their profile picture and another link is their profile url. Additionally, since this is just text on a page, there is no requirement for that page to be “special” in any way, i.e. it could just be a static page, there is no requirement for special headers or the page to be the output of a script.

Here is an example of how a user may be marked up:

<div class="h-card vcard">
    <img src="https://www.marcus-povey.co.uk/marcus.jpg" width="190" class="u-photo photo" />
    <a class="p-name fn u-url url" href="http://www.marcus-povey.co.uk/">Marcus Povey</a> aka <span class="nickname p-nickname">mapkyca</span>
    <a class="email u-email" href="mailto:marcus@marcus-povey.co.uk">marcus@marcus-povey.co.uk</a>
    <a class="u-url" href="http://www.marcus-povey.co.uk/">My profile.</a>
</div>

<a class="p-name fn u-url url" href="http://www.marcus-povey.co.uk/">Marcus Povey</a> aka <span class="nickname p-nickname">mapkyca</span>

<a class="email u-email" href="mailto:marcus@marcus-povey.co.uk">marcus@marcus-povey.co.uk</a>

<a class="u-url" href="http://www.marcus-povey.co.uk/">My profile.</a>

</div>

This markup can then be easily processed using one of the many libraries out there; if you’re using PHP I highly recommend Barnaby Walters PHP-MF2 library. In the above example I create a block, that I say identifies as a person (h-card), then details their photo, full name, email address and a url relating to them. This is probably enough information to be getting on with, but you can of course extract more profile/user information, if the markup is there.

Since a given page may contain multiple marked up people (especially if Alice clicks the “add friend” button while on a news feed), my reference implementations present a list of users which may be added, after first removing any duplicates (based on the URL of their profile), and you are also given the opportunity to fill in or amend any scraped information. If more than one URL is given for an entry, you should reconcile this by some mechanism in some way – I just render this as a dropdown in order to give Alice the choice of Bob’s primary profile, but I’m sure there’s a cleverer way.

Once Alice is happy, she can add Bob as a friend, and her site can do any post friending stuff – subscribing to Bob’s PuSH endpoint (if one is specified), or generating access credentials for Bob.

So, in summary, distributed friending works like this:

Alice sends Bob’s page URL to her magic friending endpoint (using a browser bookmarklet)
Alice’s site examines the URL for MF2 marked up h-card entries
Alice is presented with a unique list of h-card entries (where uniqueness is defined on normalised profile URLs).
Alice adds Bob as a friend and triggers any post friend events

Listening to Bob

After Alice adds Bob as a friend, she wants to be told when Bob updates his site. In Known this is accomplished by performing a Pubsubhubbub discovery and subscribe when the “friend” event is triggered (step 4 above).

I won’t go into too much detail as to how a PuSH subscription handshake works, since there’s more complete implementation information in the spec, but in summary, when Alice successfully adds Bob as a friend, her site does the following:

Alice’s site looks for a feed URL on Bob’s site.
Her site retrieves that url and looks in it for a “self” link (which is the canonical permalink for the feed of Bob’s updates).
Then her site looks at this URL again and looks for any declared PuSH hubs to which to subscribe.
If found, her site places a marker that she is subscribing to this hub in memory, then makes a subscription request.
Bob’s hub at some point later will ping Alice’s PuSH endpoint with a success or failure message.
Alice’s PuSH endpoint matches this request with the list of requests she’s made, and if the security tokens match up she can say she is subscribed.

Once subscribed, Alice’s endpoint will be pinged by Bob’s hub every time he makes an update. Alice’s site can then decide what to do with that information; perhaps Alice can use it to maintain a news feed, or send out an email update, whatever.

Friend only/private posts & friend signon

Original posts here and here, here and finally here.

So far, all we’ve really done is create a fancy RSS reader. The next step in creating a truly distributed social network is to have the ability to create posts which only your friends (or a selected subset of your friends) can see, but that the wider internet can not.

On centralised social networks this is trivial, since all users are local and can be identified in one of the many time honoured and straight forward ways, and once identified, content that they’re not permitted access to can be easily hidden. On a distributed social network, this becomes much more difficult.

Fundamentally, it’s a problem of credential exchange.

There are many techniques you could deploy to solve this problem, and most of them are not mutually exclusive. One approach might for Alice’s site to generate a random password and email it to Bob (since we likely have Bob’s email address from his h-card). Personally, I don’t think this is terribly clean.

So, I humbly put forward my thoughts on using OpenPGP keys as an identity mechanism…

OpenPGP signin

My spec for this can be found in these two posts, but in short it works as follows:

Bob generates / adds a pgp key pair to his profile, and publicises his public key in one or more of the following ways *(discussion: Bob’s site needs access to a private key in order to generate signatures, therefore this key material should be kept secure. It may be that it’s best to generate a new keypair for exclusive use by Bob’s site, but I do kind of like tying together Bob’s profile and Bob’s email and identifying both cryptographically with the same key)*
1. Via a HTTP Link header, with a rel of “key”, e.g. Link: https://example.com/bob/pubkey.asc; rel="key"
2. Via a META tag in the HTTP header, e.g. <meta href="https://example.com/bob/pubkey.asc" />
3. Via an anchor tag within the page body of rel=”key”, e.g. <a href="https://example.com/bob/pubkey.asc" rel="key">My Key</a>
4. By pasting the key into the body of the page, and giving it a class of “key”, e.g.
  
  <pre class="key"> -----BEGIN PGP PUBLIC KEY BLOCK----- .... -----END PGP PUBLIC KEY BLOCK----- </pre>
When Alice successfully adds Bob as a friend, her site attempts to extract the public key from his page. If found, her site saves the public key against Bob’s newly created user.

Now, some time later Alice creates a post, and she only wants Bob to be able to see it, so she…

Creates a new post, and adds Bob’s user to the ACL.
Bob’s site is notified by PuSH that Alice’s site has been updated, if Bob has also added Alice as a friend (because it’s a private post, we don’t push content, although conceivably we could encrypt the content with the public key of Bob, and whoever else has access. This bit is a little out of scope at the moment)
Bob visits Alice’s site and identifies himself by clicking on a bookmarklet. This bookmarklet passes the URL of Alice’s site back to Bob’s site which produces a signed request and sends it back.
Alice’s site verifies that the signature is valid, and that it was signed by the key belonging to Bob.

The signature sent by Bob’s site is formed over a message containing:

The current date and time in ISO8601 format, as produced by date('c', time()); in PHP
Bob’s profile URL
The URL of the resource on Alice’s site that Bob is requesting.

Alice’s site should, on receiving this:

Verify the signature is valid and the contents unmodified.
Verify that it was signed by Bob’s key.
Verify that the resource being requested is on Alice’s site.
Check the timestamp is valid and within an acceptable range from now.
Store the timestamp + profile url + resource url together and use it as a nonce to guard against replay attacks.
Check that we’ve not seen this request before by querying the nonce generated in step 5 against the nonce store.

If all the above passes, Alice’s site lets Bob access the restricted resource (and optionally, logs Bob in to the site, allowing him access to any other resources he has access to).

Moving forward

So far I’ve demonstrated this working in a small distributed social network comprised of Known users, Elgg users and WordPress users, as well as PGP signon from the same plus shell scripts and javascript.

Nothing here requires anything particularly special to get up and running, and I’m hopeful that after all this has been revved a few times it’ll be pretty robust.

I’d be interested in your thoughts!

favorited this.

Summary: Building a distributed social network | Marcus Povey
#indieWeb #known #socialNetworks

. @mapkyca I found marcus-povey.co.uk/2014/07/10/sum… . Good stuff! And I see that mapkyca.com is running #Known. 🙂
What became of those experiments? What is the status on #IndieWeb support in #elgg today, and/or is anyone looking into #ActivityPub support?

Marcus Povey