husk.org. chaff. posts from the January 2009 archives.

2009-01-21

Ruby, Flickr, APIs, and Flickraw

Last week, I gave a short talk at the London Ruby User Group that I titled "Avoiding API Library Antipatterns". It's really about what not to do when you're writing a library to use Flickr's API (although I think there's a lot you can learn for any RESTian service provider from the experience). You can see the slides at Slideshare and there's even a video of the talk at Skills Matter, who hosted the event.

So, what was the gist? Rather than writing lots of boilerplate code to handle every Flickr method, and wrap the returned XML into objects, instead use a generic caller and JSON to eliminate much of your work. If you're able to, use reflection to provide convenience methods. As a result, I use flickraw, because it's the only Ruby library I've seen that does things this way.

One of the weakest points of the talk was that I didn't cover how Flickr's responses don't just change when the API changes. Most methods take the 'extras' parameter, which allows you to flesh out the returned data with additional fields, such as the dates the photo was taken and uploaded. Naive libaries require patching to deal with this; smart ones, handing back the response with minimal changes, don't.

I also shied away from explicitly putting in the slides many of the recent projects I've built, including where? what? when? and the machine tag browser, which rely on using new API methods. If you're stuck behind a poor library, there are cool toys you won't be able to write without fighting it, and software's meant to help, not be hateful.

The talk itself was surprisingly quick to deliver; I blew through the slides in ten minutes, allowing lots of time for questions, which were (thankfully) interesting and insightful. The suggested use of method-missing to provide syntactic sugar (calling flickr.photos.search not flickr.call(flickr.photos.search), for example) if an API doesn't provide reflection (or if the library author wants to avoid having to fetch method information) sticks in my mind as a point that's worth repeating.

This was my first talk to a language-specific group outside Perl, and I'm fairly happy with how it went, so thanks to LRUG for listening and being so welcoming, and I hope readers here also get something out of the talk.

2009-01-13

A Resolution For 2009

computing 23:18:07

I know that it's not quite the fresh new year any more, but I still have the metaphysical hangover from the new digit in the number. I never did properly finish doing any visualisation on my raw web output statistics for last year, and I've not properly written up my new year's resolutions either.

Anna Mondo thinks that it's a bad time for resolutions, and for health issues she's probably right- you need comfort. For programmers, though, new year feels like a good time to do things: it's dark and cold out, so the natural "hide at home" inclination is actually looked on with less scorn by the rest of world, and what better arbitrary turnover point is there?

So, what do I want to do? Over the last year I've definitely branched out; I've released code in languages that were new to me, letting JavaScript, Python and Ruby compete with Perl for my attention. I had a fairly productive end-of-year, complete with a lot of Flickr toys and an interview on their code site (which was a very pleasant surprise.) So, what do I need to change next year?

I think there are two ways to go. I agree with Giles Bowkett that new languages aren't the way to go, but I disagree with him that starting lots of new projects is. Instead, I'm taking some advice from Jonathan Rasmusson on how to become a better programmer: keep working on things.

His post, evidently, is from the point of view of a commercial freelancer, but some of the same lessons apply to personal projects. It's tempting to launch them, and then watch as the world notices, and move on, allowing others to take it further. I don't think I'm really happy with that, though. groupr, snaptrip and the two most recent Flickr API explorations all ended up with items left on their todo list, and it's poor form to leave them unfinished forever (even if they did reach a point where there was enough to publish them).

Beyond that, though, there's the worth of making sure your code is readable after a time, by returning to it, and the related skill of refactoring, as you go back and do things better. As Rasmusson says: "You basically miss out on all the great feedback that would tell you where you kicked butt, and where you screwed up. All of which would of course help you on your next project."

Of course, that doesn't entirely mean "no new projects". I'm only human, and anyway, there are a couple of things that I've left in half-finished pieces for so long that, to my mind, they almost count anyway. Nonetheless, I hope to go back and extend existing stuff too. Perhaps that's just something I should be doing anyway, but then, aren't most resolutions? Happy new year.

2009-01-08

Getting My Vox Off

computing 18:12:11

For the last two and a half years or so, I've been posting far more to my blog on Six Apart's Vox site than I have here. However, I've decided to stop that and move back here.

One of the reasons I'm doing so is that I found it far too difficult to access my content programmatically. Initially, I was doing so to try and get better details of what I'd posted, when, so I could flesh out my 2008 statistics, but now it's turned into a (possibly quixotic) journey to get my data back. (It feels especially important in this era of closing web services.)

Why is it so hard? For a start, it's tricky to figure out what API Vox offers. A bit of searching led me to believe that the only way I'd get anywhere was using Atom.

If you thought that Atom was like RSS, well, it is. However, it's also short for the Atom Publishing Protocol, which is completely different, except for when it's not. Now an IETF standard, APP (as I shall henceforth refer to it) allows you to retrieve, post and edit your entries on a service that supports it. Well, in theory it does.

In practice, the Atom API responses I was getting didn't have any paging information, and only 20 entries. This is obviously a killer when you're trying to retrieve a year's worth of data (or everything, for that matter). Annoyingly, it's clear that the underlying system knows how many posts there are: it's in an openSearch XML element at the top of the XML. I've found nothing to tell me how to use queries to get the other entries, but at least this response did contain pointers to authoritative feeds for each entry.

In contrast, the public Atom feeds do contain straightforward paging, so I ended up falling back to these to determine Vox's internal IDs for each entry. Naturally, this feed didn't contain links to the APP single-post feeds; instead, their data had additional cruft, like "Email this" links at the end of each entry.

The upshot was that I was able to get all my posts out of the site by paging through the Atom syndication feeds to get the IDs, which I then fed to the APP API to fetch each post as a lump of XML. This turns out to be entirely useless for blog import purposes, but it'll do as an archive (and I can always work on convertors later).

However, posts are a fairly small part of my content. The APP content list also contained pointers to embedded resources and comments, but as of yet I haven't worked hard at fetching these. I'm even further from copying down my entire library, which contains books, videos and other items that aren't used directly from posts.

So, what's the conclusion? Vox does have a way of getting your core data out, but it's almost entirely undocumented and seems inconsistent. In the end I had to rely on two different chunks of Atom XML to get my data, and even then it was merely a subset of my content. All in all, I found it too frustrating by far.

husk.org / blog. chaff. occasional witterings.