So, a few months ago, Flickr decided to change their terms and conditions so that they could sell your Creative Commons photos. This got a lot of people’s goat, myself included since I’m a paid user, and have been for a while (as an aside, had Yahoo done it as a profit share, it would have been awesome for everybody, but noooo…)

Because self hosting in this post Snowden world is only ever going to be a good thing, I don’t want my family photos used in corporate branding without a cut, and because I wanted to be a good #indieweb citizen, I thought I’d take the plunge and move to self hosting.

I’ve tried this before with Trovebox Community Edition but didn’t have much success – while their Flickr data export seemed to work, the import didn’t. They’ve probably got it working by now, but I pretty much gave up.

Anyway, since I’m a contributor to Known, I thought I’d dogfood and hack together an importer.

Flickr Importer for Known

The importer works by calling the Flickr API using credentials stored in your linked Flickr account. To do this, it uses the Flickr syndication plugin to do the donkey work of linking your accounts.

Once activated, and your Flickr account is linked, you are given the option to run an import.

The import job will run in the background, and will import all your photos and videos into your photostream (using the Photo and Media plugins which should also be activated), preserving timestamps, titles, body and tags.

At the time of writing I’ve not got it importing photosets and collections, since Known currently lacks a logical mapping, but I’m keen to at least record this information for later processing. The script will import sets and collections as generic data items, which you can expose by writing support into your theme.

The plugin records state, so it should recover from crashes, and you can re-sync safely at any time.

Have a play and let me know what you think! Pull requests are of course welcome.

» Visit the project on Github...

In this post I’m going to discuss a potential attack, using a common method of implementing webmention comments on a site, that can allow an attacker to obtain visitor information from a third party site, and to possibly launch drive-by attacks.

This came about from a discussion related to retrieving non-TLS protected resources from a TLS protected site, and it got me thinking that the problem went a little deeper.

The Attack

A common way of handling webmentions on an Indieweb site, such as those powered by Known, is as follows:

  1. Alice writes an comment on her site, and references Bob’s post
  2. Alice sends a webmention to Bob’s site referencing the URL of her comment, and the post she’s referring to.
  3. Bob’s site retrieves Alice’s comment & parses it for Microformats markup
  4. If all things check out, Bob’s site then renders the comment using text, profile url and profile icon information obtained from Alice’s site.

It is step 4 that’s the problem here.

Typically, when the webmention is parsed and rendered by Bob, the site software will attempt to construct a nice looking comment. To do this, the site software will typically render an avatar icon, together with a user name, next to the comment. This information is obtained by parsing MF2 data from Alice’s site, and while the Webmention spec says that content should be sanitised for XSS etc, profile icons are often overlooked – a URL is fairly innocuous, so it’s generally just dropped into an img tag.

Now, if Alice was evil, she could, for example, configure her server to send “in the past” cache headers when her server served her avatar. This would mean that her server logs would then start collecting some detailed traffic information about the visitors of the page she webmentioned, since every visitor’s browser would retrieve a new copy of her profile icon.

She could, if she was very smart (or was a well funded government agency sitting on a whole bunch of zero day browser exploits) serve specially crafted content designed to trigger a buffer overflow in a specific visitor’s browser at this point.

Worse, she could do this even if the entire site was protected by TLS.

Mitigation

The simplest way to prevent this kind of exploit is not to render profile icons from webmentions. This is, however, a sub-optimal solution.

My current thinking is that Bob’s site (the site receiving and rendering the webmention) should, when receiving the webmention, fetch and cache the profile icon and serve it locally from his server.

This would prevent Alice from performing much in the way of traffic analysis since her server would only be hit for the original request. If you server re-samples the image as well (to enforce a specific size, for example) then the process would likely do much to strip any potential hidden nasties embedded in the file.

There is a DoS potential to this, but techniques for mitigating DoS for webmention/MF2 parsing have already been discussed in the Webmention spec.

Anyway… thoughts?

This article got me pondering on how one might start building a distributed “related article” network, but without relying on a centralised silo.

Related articles on the same site is largely a solved problem, but at the moment, to do the similar thing with multiple sites requires a centralised service. Centralisation is bad, as we’ve discussed before, so how could you build a federated network of sites, all referring people between each other in an automated but meaningful way?

My current thinking is to leverage PuSH; Alice lists sites to which they’d like to receive related articles from, these could be individual sites or even a centralised aggregator. Alice’s site then subscribes to the PuSH hub and starts receiving updates, when these updates are received they can be passed through to whatever comparison algorithm you’re using – I’m thinking of adapting the wordpress one for this blog.

Should be fairly straightforward to implement, and would provide a simple way to federate content within a group of individuals.

Anyone working on something like this, or shall I drop this into my todo list?