/ blog. chaff. occasional witterings.


Aggregation and the Edge

computing 11:42:28

A couple of years ago, I decided that there was something worth building that would be a combination of aggregation and social networking. People had done the first part (I used Suprglu, for example, and you could (and still can) turn Tumblr into an aggregator) and the second bit (at that point, Facebook was rapidly climbing) but the two hadn't really been put together.

Of course, within a few months, not only had Facebook sewn up the market for the casual user, while FriendFeed had emerged and become popular amongst the alpha geeks. I went off and built my own shallow-aggregation front page, while it seemed more than a few people decided the right approach was to cross post from every service to every other one (which annoyed me no end, and made me think that filters will eventually become very important- but that might be another post).

Now I'm beginning to see a new shape of software emerge from people's discussions. Its genesis probably came about after last year's XTech, when Jeremy Keith posted about Steve Pemberton's talk, Why You Should Have A Web Site:

With machine-readable pages, we don't need those separate sites. We can reclaim our data and still get the value. Web 3.0 sites will aggregate your data (Oh God, he is using the term unironically).

The idea lay germinating for a while, but it's emerged back into the spotlight because of the wave of site closures and database losses over the New Year. Users of Pownce found themselves without a site, and Magnolia's bookmarks were lost in a database crash, for example. If you have a deeply-aggregated site - one where you host a local store of data that's also on a remote service, like Phil Gyford and Tom Insam have built - then you, by definition, have a local backup.

I think doing things this way around - using the remote services as primaries, with your own site being fed from, but (if needs be) independent of them - makes the most sense. You can use the social networks of sites like Flickr, which are their strong points.

Now, I'm not there yet. My current site uses only shallow aggregation - I pull in links and posts, but I show them to the user and then forget them. The first step to making a proper site is to build a local database and start backing up to it. This is probably worth doing no matter what - in fact, it's the whole point of Jeremy Keith's most recent journal post - and it turns out I have the seeds of the code I need in the scripts I wrote to generate my 2008 web posting statistics.

I'd already been considering using a key-value or document store rather than an RDBMS for this when I saw Jens Alfke's post on using CouchDB for a "Web 3.0" site (look, it's that term again!) He notes that, while unfinishes, CouchDB looks like it may implement a usable system for replicating data at web scale, so that the social activity could finally move from specific sites to the edge (or at least, our own colos).

Now I'm wondering: is there a space for a piece of user-installable software, like Movable Type or Wordpress, that aggregates their data from sites across the web, and then presents it as a site? If there is, is it even possible to write it in a way that anyone who couldn't have written it themselves can even use it? Can I write it just for myself in the first place? I don't know, but in the next year, I think we'll find out.