Stop Talking About Federation. Build Something.

Last week we had a hackathon during which time we successfully demonstrated discovery and federation of catalogues between EOSC nodes

Marcus Povey

My general approach to technical problem-solving can be summarised fairly bluntly: make something that works, and let someone tell you where it’s wrong. The alternative, dotting every I and crossing every T before breaking ground, tends to produce very polished discussions and very little actual software.

This was very much the philosophy behind the hackathon I organised last week as part of the EOSC United project.

tl;dr: By the end of one Friday afternoon, we’d moved EOSC catalogue federation from conceptual discussion to working prototype, with a concrete set of recommendations and, perhaps more usefully, a concrete list of things that still need answering.

The goal was deliberately modest

EOSC catalogue federation, the idea that distributed research infrastructure nodes should be able to share and discover each other’s service catalogues, has generated a lot of discussion. Quite a lot. The kind where the agenda is longer than the available time and the action points are mostly “discuss further.”

What I wanted from the afternoon was something much smaller: take metadata from one node catalogue, fetch it, parse it, map it into something DCAT-compatible, and demonstrate that another catalogue could consume it. Stop treating federation as an abstract future state and make a small, specific piece of it actually happen.

That turned out not to be optimistic enough.

Coming prepared

Before the session, with representatives from Data-Terra, the EOSC Association, and myself representing Instruct / the Life Sciences Connect Node, I’d put together two small tools to make the problem concrete rather than theoretical.

The first was the EOSC service catalogue library, designed to inspect, fetch, parse, and normalise service catalogue records. The second was the EOSC node discovery library, which experiments with a simple mechanism for a node to advertise what catalogue and capability endpoints it exposes.

Neither should be mistaken for finished infrastructure. They’re working probes. Their value isn’t that they solve the problem, it’s that they make the problem less theoretical. Instead of asking “could this be done?”, you can point them at an endpoint and see what breaks.

For node discovery, I suggested using the capabilities format from the EOSC Beyond project sandbox, a simple JSON file at /.well-known/eosc-federation/node. Absent any official guidance, I’d rather propose something specific and get corrected than wait for a committee to ratify the obvious.

What actually happened

Things went remarkably smoothly. Within the afternoon I had interrogated the Data-Terra node using their discovery endpoint, extracted their catalogue from the DCAT feed (including IDs, descriptions, and other metadata) and had it in a format suitable for ingestion into ARIA. Not wired up end-to-end, but close enough to prove it was straightforward and practical.

Having proved the concept, the other participants went further. By the end of the session they’d synced the Life Sciences catalogue directly into their portal, with working UI. I mention this not to suggest I underdelivered (we hit the stated goal; they sprinted past it), but because it’s a useful illustration of what becomes possible when there’s something working to build on rather than a diagram to argue about.

Worth noting: the EEN-to-DCAT mapping I’d produced as part of my previous node work was largely compatible with the one Data-Terra had independently developed. A lovely example of parallel evolution, and a quiet confirmation that I wasn’t completely off the mark.

What we now know we don’t know

A working prototype is also useful for surfacing the things you hadn’t thought to ask. We identified two gaps.

The first is a circular federation problem: node A shares with B, B shares with C, C shares with A. This should be solvable with clear guidance on canonical service IDs, but it does need addressing before anyone does it accidentally.

The second is more nuanced: a service that legitimately appears in two separate catalogues as its own distinct thing. How that gets handled will require a bit more thought. I have ideas. They’ll keep for the next session.

The point

There’s a certain comfort in keeping technical problems at the conceptual stage. Nothing can be wrong if nothing has been built. But you also can’t find the real problems until you’ve got something running and have pointed it at actual infrastructure.

One Friday afternoon, a small group with working code moved EOSC catalogue federation from “theoretically possible” to “demonstrably real.” The gaps we found are tractable. The path forward is clearer than it was last Thursday.

That’s what building something gets you.

Leave a Reply