Free the data, multiply the curators

February 17, 2019

The level of influence that a few entities have today on what billions of people read, play, watch or listen to is most likely unprecedented. But even though the volume of content and the number of creators have both been growing at breakneck speed, the number of curatorsĀ seems to be shrinking. We live in a world where it has never been easier to create, publish and share content, but there are only so many ways available to distribute it.

Half a dozen for-profit corporations have become the gateway to the Internet for the vast majority of its users. If you're not alarmed, you should be. The World Wide Web was designed with an architecture believed to be resilient against these kinds of situations. Social networks found a way around it, and positioned themselves at the very center of it.

The web began as a network where you could only visit places you had an address for. When it started to grow in popularity, directories like Yahoo enabled users to find websites by theme. A few years later, with the advent of search engines, URLs became less important, but the fundamental dynamic between users and content remained the same; information did not come to them, they had to go to it. Social networks changed this dynamic. They figured out a way to push content to users. Passively. Effortlessly.

The virtually unlimited content served to users came with the perverse effects of giving them an illusion of choice and control. Scrolling down their Facebook feed, they forget that what they are seeing is not a neutral chronological feed, but an extraordinarily customized view of the Internet, engineered to satisfy ambitious financial considerations. The reality is that the algorithms that determine what you see do not have your best interests in mind. If we could really trust corporations with such a thing and give them the benefit of the doubt, cigarettes would have disappeared three decades ago, and fat versus sugar wouldn't be a debate today.

At the heart of the success of social networks is their ability to sustain a true virtuous cycle between content creation and curation. Their power - and the genius of their business model - comes from the tight integration they have achieved between exclusive user-generated content, and a prerogative to curate and distribute it for almost nothing in order to maximize the metrics they are pursuing.

  1. Content: provide users with a way to create content with various media types, and expose them on the Internet using a proprietary feed (e.g timeline, profile);
  2. Curation: create a user-specific curated selection of posts from a list of sources (e.g followed, friends), factoring in parameters such as time, geography, interests, and other things.

How can we sustain and improve the level of content creation that exists today, but provide users with more choice in terms of curation? Simple: free the data, multiply the curators. Of course, it's easier said than done, but as you will see, many of the necessary building blocks are already here, and most of them are standards. What's really missing is a little bit of glue, and better marketing.

Remember RSS feeds? I wouIdn't blame you if you didn't. To understand why they failed, compare the idea of subscribing to someone's RSS Feed, to that of simply following them. And just when a few people started getting familiar with the term, Atom was introduced, leaving end-users with an unnecessary choice to make between two formats designed to do the same thing. The RSS/Feed readers available back then were designed for power-users, and did very little to suggest feeds that would interest neophytes. They simply required too much work and configuration compared to an Instagram.

The challenge is to provide users with a virtually unlimited number of Twitters and Facebooks and Instagrams. One curator could offer users to pay a subscription to enable a no-ads policy. Another would feature some ads, but would be a simple chronological feed. One would be focused on the photo posts of your friends or co-workers.

Here's an incomplete technical proposal with an architecture designed with 2 agents, no blockchain, coin or token involved:

  1. Publishers, to expose content via timeline servers;
  2. Curators, to provide users with content.

Publishers

The primary objective is to create a protocol to publish and expose a Twitter-like timeline, using established standards and existing infrastructure as much as possible.

An unlimited number of publishers could use a Timeline server to expose Atom feeds over HTTPS, each one of them representing the content published by a distinct user.

Atom feeds can carry any media format (e.g short messages, articles, music, videos, podcasts, photos), or simply a link.

Discoverability

A timeline could be identified using an email address. This comes with many benefits:

  1. It's shorter and easier to remember than the URL of an RSS feed as it exists today;
  2. Contacts stored locally or on any CardDav compatible server could become a user's friends and immediate social network;
  3. Sending a private message to a feed becomes as natural as sending an email.

Examples:

mohamed@attahri.com ->  Timeline of Mohamed Attahri

uspolitics@nyt.com  ->  Timeline of the US Politics page of the New York Times

news@enigma.com     ->  Timeline of Enigma.com, the company.

The biggest problem with email addresses being used this way is spam.

Implementation

SRV records would need to be set up to declare the availability of the Timeline server:

_timeline._tcp.attahri.com. 86400   IN SRV  0   5   443 timelines.attahri.com

For example, a client trying to find the timeline mohamed@attahri.com would:

  1. Query the DNS records of the timeline's domain attahri.com;
  2. Find the host and port of the server timelines.attahri.com:443;
  3. Send a request to https://timelines.attahri.com/mohamed to retrieve the Atom feed.

Request:

GET /mohamed HTTP/1.1
Accept: application/atom+xml
Range: entries=0-1
If-Unmodified-Since: Fri, 15 Feb 2019 21:12:49 GMT
User-Agent: NoAdCurator/4.0 (Bot)
Accept-Encoding: gzip, deflate

Response:

HTTP/1.1 206 Partial Content
Date: Fri, 15 Feb 2019 22:12:49 GMT
Content-Range: entries 0-9/2300
Content-Length: 13822
Content-Type: application/atom+xml
Last-Modified: Fri, 15 Feb 2019 21:12:49 GMT
Cache-Control: public, must-revalidate

<?xml version="1.0" encoding="UTF-8" ?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Mohamed Attahri</title>
    <subtitle>I'm Mohamed Attahri. I live in New York, and this is my timeline.</subtitle>
    <link rel="self" href="https://mohamed.attahri.com/"/>
    <link rel="alternate" type="text/html" href="https://mohamed.attahri.com/"/>
    <id>https://mohamed.attahri.com/</id>
    <icon>https://mohamed.attahri.com/img/image.jpg</icon>
    <logo>https://mohamed.attahri.com/img/bg.jpg</logo>
    <author>
        <name>Mohamed Attahri</name>
        <email>mohamed@attahri.com</email>
        <uri>https://mohamed.attahri.com/</uri>
    </author>
    <updated>2019-02-15T21:12:49Z</updated>
    [entries...]
</feed>

This is a simple solution to allow any blogging platform or news publication to expose their existing Atom feeds with minimal effort.

Social interactivity

Below are a few ideas on how we could implement some important social interactivity features, many of which were popularized by Twitter.

Replies

A a user could reply to another post by creating an entry with a <link> element containing an attribute rel="related".

<entry>
    [...]
    <content>Content of the reply.</content>
    <link rel="related" href="https://timelines.domain.com/johndoe/a983d96f-dd05-4fe6-a18f-7e8eb15068ff" type="application/atom+xml"/>
</entry>

Retweets

Similarly, a retweet - or a repost - would use a rel="via" instead:

<entry>
    [...]
    <content>Retweet with a comment.</content>
    <link rel="via" href="https://timelines.domain.com/johndoe/435ffbb6-ed8f-406b-a39b-5118035d7737" type="application/atom+xml"/>
</entry>

Mentions

Mentions would work exactly as they do in most social networks today, but email addresses would replace usernames.

Hashtags

Hashtags could be implemented as they are today, or maybe use the optional <category> element specified in the Atom specification.

<entry>
    [...]
    <category term="election2020"/>
    <category term="economy"/>
</entry>

Identity

Something akin to verified accounts could be achieved using the existing certificate infrastructure and domain-based security. For example, the timeline status@digitalocean.com could be verified the same way one would verify that digitalocean.com belongs to DigitalOcean LLC.

Posts could be signed the same way some organizations digitally sign their emails, or developers their commits on git.

Of course, remaining anonymous would be perfectly possible.

Curators

Technically, publishing the data is the easiest thing to do. The challenge is to distribute it and make it accessible in near real-time to as many curation providers is possible, so that each one of them can provide a different experience to their users.

One possible way to achieve this would be to rely on connecting the curators via a peer-to-peer network to exchange messages on updates made to timelines, using a sub-pub approach. The IPFS project is currently experimenting with various ways to create similar networks efficiently. A gossip protocol would be ideal if designed to mimic the topology of the graph formed by content, curators and users, and could help disseminate information in near real-time. The publishing of a new post, reply or mention would be broadcasted to the network, so that curators interested in updates to those timelines in particular could issue a request to refresh their content.

This is a critical piece of infrastructure that the World Wide Web is desperatly in need of to defend itself against the threat of closed platforms for which it's easier to implement. Put simply, the web needs to handle real-time in a decentralized way.

At a minimum, a curator on the network would subscribe to all the updates for all the timelines and categories that its users are following.

How the content is actually curated and rendered to the user is entirely up to the curator. What matters is choice.

Privacy

Not all timelines should be public. To allow users to choose who can see their private timeline, a simple protocol could be created to exchange a key between the timeline server and the curator:

  1. Alice's Curator requests access to Bob's timeline on her behalf;
  2. Bob's Timeline server presents the request from Alice to Bob;
  3. Bob accepts Alice's request;
  4. Bob's Timeline server generates a key specific to Alice and her curator;
  5. Alice's curator can access Bob's timeline by presenting the key obtained with the process above.

It would be perfectly possible to imagine some posts in a timeline being public, while others remain private. The timeline server would only serve the private posts if a key is present in the request.

Unsolved Problems

This is not the first attempt to separate the data from the applications. The father of the World Wide Web himself is one of the contributors to Solid, an architecture and a set of protocols with a similar goal. The ideas in this post are not incompatible with that vision. Where I believe we differ is in the need to define an open infrastructure to handle real-time information. I hope this leads to an attempt to create one.