husk.org / blog. chaff. occasional witterings.

2009-02-17

Aggregation and the Edge

computing 11:42:28

A couple of years ago, I decided that there was something worth building that would be a combination of aggregation and social networking. People had done the first part (I used Suprglu, for example, and you could (and still can) turn Tumblr into an aggregator) and the second bit (at that point, Facebook was rapidly climbing) but the two hadn't really been put together.

Of course, within a few months, not only had Facebook sewn up the market for the casual user, while FriendFeed had emerged and become popular amongst the alpha geeks. I went off and built my own shallow-aggregation front page, while it seemed more than a few people decided the right approach was to cross post from every service to every other one (which annoyed me no end, and made me think that filters will eventually become very important- but that might be another post).

Now I'm beginning to see a new shape of software emerge from people's discussions. Its genesis probably came about after last year's XTech, when Jeremy Keith posted about Steve Pemberton's talk, Why You Should Have A Web Site:

With machine-readable pages, we don't need those separate sites. We can reclaim our data and still get the value. Web 3.0 sites will aggregate your data (Oh God, he is using the term unironically).

The idea lay germinating for a while, but it's emerged back into the spotlight because of the wave of site closures and database losses over the New Year. Users of Pownce found themselves without a site, and Magnolia's bookmarks were lost in a database crash, for example. If you have a deeply-aggregated site - one where you host a local store of data that's also on a remote service, like Phil Gyford and Tom Insam have built - then you, by definition, have a local backup.

I think doing things this way around - using the remote services as primaries, with your own site being fed from, but (if needs be) independent of them - makes the most sense. You can use the social networks of sites like Flickr, which are their strong points.

Now, I'm not there yet. My current site uses only shallow aggregation - I pull in links and posts, but I show them to the user and then forget them. The first step to making a proper site is to build a local database and start backing up to it. This is probably worth doing no matter what - in fact, it's the whole point of Jeremy Keith's most recent journal post - and it turns out I have the seeds of the code I need in the scripts I wrote to generate my 2008 web posting statistics.

I'd already been considering using a key-value or document store rather than an RDBMS for this when I saw Jens Alfke's post on using CouchDB for a "Web 3.0" site (look, it's that term again!) He notes that, while unfinishes, CouchDB looks like it may implement a usable system for replicating data at web scale, so that the social activity could finally move from specific sites to the edge (or at least, our own colos).

Now I'm wondering: is there a space for a piece of user-installable software, like Movable Type or Wordpress, that aggregates their data from sites across the web, and then presents it as a site? If there is, is it even possible to write it in a way that anyone who couldn't have written it themselves can even use it? Can I write it just for myself in the first place? I don't know, but in the next year, I think we'll find out.

2008-03-04

Feeding The Daemons

For over a year, the home page of husk.org has been a collection of content elsewhere. The reason's pretty simple; I don't write long form posts often enough for a standard blog to work as a front page, but I didn't like the static nature of a plain index either. Keeping things updated with links and photos as well seemed like a plan. (Nowadays, you can do this sort of thing within Movable Type itself, but a year ago, you couldn't. Also, I'm rubbish at getting around to upgrades anyway.)

The code that generates the front page is pretty simple, really; it fetches four RSS feeds, does a bit of data munging to group them by day then source, with a bit of caching around the edges. However, despite relying on RSS for the content, there wasn't a syndication feed of the aggregated material. Now there is, and, perhaps sensibly, it uses the same Perl library (XML::Feed) to generate as it uses to consume the feeds elsewhere. (There's actually another post in the technical problems there, if I find time.)

So, if you're following this blog via a feed, why not switch to one of the aggregated ones instead? There'll be more content (as they pull in my more-regularly-updated Vox blog, as well as other stuff), and you might even find it interesting.

There's actually a choice of four feeds; one that shows everything, two that show text (one for long form writing, like this post, only; the other, posts and links combined) and finally one for images, which, right now, is just my Flickr photos, but that might change. If I do add feeds, there'll be a post about how to exclude things. (It's already there in the code.)

Speaking of excluding things, I've started using a feed reader again, a little. This time, I'm not using it for news (I know to check that), or sites that have aggregation (like Twitter, del.icio.us, Flickr, or LiveJournal), but for those one-off blogs you don't check otherwise. Unfortunately, there are a few people I'd subscribe to, but they insist on posting either their del.icio.us links, twitters, or both, to their feed. I can understand why - hell, I bring in links - but I also offer a feed without, and that's what I'd really want from them, too.

It might just be me (after all, I already see things via the sites mentioned above), but I've also started seeing grumblings about this from other people. With the launch of friendfeed and other lifestreaming/aggregation services, for example, people are flagging "duplication, or the infinite echo problem" as an emerging problem. Meanwhile, others are finding that infrequent blogging suits them fine. The comments note that an RSS feed lets you subscribe to someone who posts interesting stuff, but rarely; I'd argue that bringing in links to pad your content might actually drive people to leave again.

Just to make this a constructive criticism, here's how I'd work around this if I was using a blogging engine alone. All my long-form posts would get a category or tag (say "longform") and, as well as an overall RSS feed, I'd advertise a feed for original posts only. The bottom line? Aggregation's good, but let your consumers choose what they want to subscribe to.

Read the 3 comments.

2005-03-16

A Quick Meta-Post

The box that I host husk.org on was rebuilt over Christmas. You'd not have noticed, but it completely broke MT, because it couldn't read the DB files. Only two and a bit months later, I moved them to MySQL and I could actually edit things again, resulting in the post about Flame a week or so ago.

However, annoyingly, there were still a few supporting CGI scripts that didn't work, so comments, trackbacks and the like were broken until, ooh, ten minutes ago. Apologies to the two real users who probably tried to comment on Flame and the dozen or so spammers who hit a 500 Internal Server Error some time in January, when I wasn't posting anyway.

Now it's working, you'll have noticed I seem to be trying to get some thoughts about digital music (which, due to my Mac weenie nature, really means iTunes, but not, oddly, the store, which I hardly use) out into the open, possibly to be useful, but more likely to stop them marching around my brain so some other things can take hold there.

Consider yourselves warned.

Read the 2 comments.

2003-11-12

The Life of Cranes

Almost two weeks ago now, thingsmagazine found craneporn.org, my other domain. (I only have two. More seems a bit wasteful, really.) Subsequently, metafilter and a whole host of others have seen fit to look at the site. So my previous daily average of six page views has gone up. Just a bit.

Continue reading this extended entry.

2003-10-30

Don't break the Mac with Cmd+Tab

computing 14:11:13

(Being an elucidation for the benefit of Mr Coates.)

Jason Kottke proposed a retooling of Cmd-Tab and Exposé. I don't like it.

Continue reading this extended entry.

2003-03-13

Trackbacks, spam and conversations

Today, we (well, I say 'we'; I mean, as often happens, that I kept banging on about an idea and Tom took an idea we had, worked on it, came back to us with a couple of queries, then finalised the code) added trackbacks to the 2lmc spool.

I must say that I'm impressed with this, partly since it is (as far as I'm aware) the first IRC-backed weblog tool that actually sends and recieves trackbacks. However, what's more interesting is what happened after we actually had it running live. Firstly, we tried running it against an earlier entry pointing at Ben Hammersley's webblog. It turns out that he doesn't have the TrackBack autodiscovery code in his pages. Later on, we found an article that was kind of interesting on Tom Coates' plasticbag.org, and discussed it past the magic 5-post limit on the spool, and I think we covered some at least vaguely interesting ideas. However, I'll admit that we did take the discussion in a somewhat tangential direction.

Now, call me paranoid, but when within three hours Tom had posted about autodiscovery and trackback etiquette. He wants people to turn on autodiscovery, in Movable Type, which is good. (It's very simple, too: just put <$MTEntryTrackbackData$> in the <head> section of the main index and individual entry archives, at least. Incidentally, I found out that while I was yakshaving I missed a bit, namely the trackback RDF in individual entry archives.) The part I take exception to is some of his assumptions in his third section. I agree completely about recieving pings; in fact, the spool recieved before it could send. However, I'm not in agreement over his second point. Maybe it's because I'm lightly trafficed and/or more curious about my readers, but I'd be happy if a pure linklog sent me a ping. He also wants trackbacks to be seen as a way the 'thread of a conversation can be maintained.'- his emphasis.

Funnily enough, Leon and I were discussing this at lunch, and he wondered what the point of trackbacks was. Is it to allow you to comment on stories on your own site? Is it that it allows a way of tying things together? He asked if I ever visited trackbacks, and I do; I find they often lead to more interesting things, admittedly sometimes unrelated, unlike comments. So I don't feel that they do have to maintain a thread of conversation. (On the other hand, I'm on a fair few mailing lists that have suffered massive thread drift. I may be conditioned to this more than other folks.)

After Tom's post, I mentioned that he almost seemed to see trackbacks without comments as spam, and Leon noted that it would be trivial to do *real* spam with trackbacks; minimal Googling reveals that this isn't an original idea. So, what's the solution, to both celebrity-hunting trackbacks, for want of a better word, and to real spam, should it emerge? Perhaps it is to be liberal in what you recieve, and selective about what you display, whether that's by automatic metadata parsing ('this ping says it's from a linkblog; I won't display it'; 'I know this site already, displaying it is fine') or by manual filtering. Personally, though, as I said, I don't yet find it a problem.

2003-02-11

Yakshaving

I've got about three brewing posts for this place, but instead of actually writing them up, or finishing the book review I promised London.pm, I ended up spending Sunday yak-shaving.

In particular, I went through all the MT templates I needed to, adding comment pages inline (as opposed to the default templates which use horrible nasty popups), fixing their usability by moving the form elements around and writing some mildly hairy logic for a nice layout on the front page. I did look at Simplecomments, but decided against it. (Of course, today I've found a rant by Tom Coates about it. Now, I'm not that bothered about building a community or whatever here, but it's interesting to see what people who do care think about it.) It also means I've enabled TrackBack; time will tell if it's useful.

I've also tidied up the optional RSS feed that contains full entries. For a long time I've been fighting a losing battle that one of the initials in RSS stands for 'Summary', but the post-NNW world seems to think that having to visit a website is a grievous affront to their human rights, and an invitation to write scrapers. (Of course, I've done a bit of that in my time, too. Ho hum.) So I caved in.

Of course, I don't update chaff anything like frequently. In the unlikely case you really want to read more by me, I tend to linkdump on 2lmc's blog these days, and there's occasionally a new photo set on stem, my photo archive, which I'm considering adding RSS to. (Yay, more removed hairy mammal covering.)

Anyway, now that's done, I can get on and actually write something.

Read the 2 comments.

2002-05-07

Still the default template

Sigh. I was going to fix the templates on this so it looks like the old blog, and even, maybe, dive into the code so that the one thing Blogger does that MT doesn't (and that I want to use) is supported. But no.

On the other hand, I did get a lot of work done on yapi, the photo indexer andym wrote, that nicks the ideas behind ade's site, and I'm (slowly) moving my old photos into it. So not a wasted weekend.

2001-12-04

On my blogging

You know how some people who have blogs manage it every day, and others go for ages, adding only a little bit? I'm a third type; every now and again I dump about a week's worth of crusty outgoing branejuice to this thing. Here's a bit more.

2001-09-26

CFT

Well, left the shores of employment for the seas of uncertainty on Friday. You'd have thought that what with that, two gigs, London Open House, some really stimulating pub conversations and a couple of videos, I'd have plenty of blog material. You'd be right.

Sadly, I only have internet access from 6 pm to 8 am, and that's exactly the time I'm either a) out or b) sleeping. Sigh.

I may, at some point, recap my weekend, or pull out random thoughts from it. Of course, I may just forget them all...

navigation.