husk.org. a website by Paul Mison.

2010-09-02

A map for every day | Phil Gyford’s website

delicious 12:09:33
"Eighteen months ago I wrote about redesigning my site’s front page and mentioned in passing that I’d also created a page for every day which aggregated many things. I’ve now taken this a step further and added a map for every day which aggregates various pieces of location-based information about me." I've been thinking about doing this, but Phil actually has. Interesting musings about privacy in there, too.

2010-08-29

Oxford English Dictionary 'will not be printed again' | Telegraph

delicious 20:25:34
Simon Winchester, quoted in the article: "Until six months ago I was clinging to the idea that printed books would likely last for ever. Since the arrival of the iPad I am now wholly convinced otherwise. The printed book is about to vanish at extraordinary speed. I have two complete OEDs, but never consult them – I use the online OED five or six times daily. The same with many of my reference books – and soon with most. Books are about to vanish; reading is about to expand as a pastime; these are inescapable realities."

2010-08-27

Train fares: From him that hath shall be taken | The Economist

delicious 11:55:19
On the likelihood of fare rises for British rail. "Railways have enjoyed a renaissance over the past decade. Passenger numbers have surged and are now at their highest since the second world war." "Yet rail travel is a niche interest. It accounts for just 7% of all journeys."

Transport funding: Collision course | The Economist

delicious 11:53:20
Speaking of TfL: "With only a few months until publication of the government’s spending review, which will decide which parts of government live and which parts die, the lobbying is in full swing. Particularly fierce arguments are raging around the Department for Transport (DfT), which must make cuts of 25% or more in its budget. That is provoking rows, both nationally and locally." Complete with that old chestnut, Crossrail vs the Tube.

Public transport: End of the lines | The Economist

delicious 11:51:04
"The Metropolitan Atlanta Rapid Transport Authority (MARTA), which runs the city’s buses and trains, is facing a $70m deficit next year, and will eliminate 40 of its 131 bus lines. It is also raising fares for weekly and monthly passes, cutting rail services by 14.2% and laying off around 300 people." This seems incredibly short-sighted, but that seems to be America for you. (The UK is threatened with 25%-40% funding cuts, but the idea of cutting 20% of bus lines seems unimaginable. But then, TfL has what would be in American terms an almost unbelievable amount of power over a region's transport.)

High-speed rail in Europe: Trouble ahead | The Economist

delicious 11:43:31
"SNCF and Deutsche Bahn are for the first time competing directly for mastery of European high-speed rail." "ICE trains are more spacious. French trains à grande vitesse (TGVs) are faster. The TGV is usually cheaper, but ICE trains are used to competing with German luxury cars."

2010-08-26

Unread Count Tracking | movieos

delicious 22:28:52
Tom on the effects of his unread count code. "Having these graphs has had 2 interesting effects. Firstly, I’ve found it lots easier to keep my unread counts down. ... The other effect it has had is that I’m slightly more inclined to read things just before I know the hourly monitoring script triggers. I know I’m being observed now."

Unread Count Tracking | Tom Insam

delicious 21:50:18
"This is a collection of scripts for tracking unread count (I'm musing other things, though) across various online services that I use, so I can easily see how swamped I am with data this week." "The Instapaper unread counter has to screen-scrape the web site, so has to keep track of cookies and things, but this is pretty much a detail"

Get Unread Count for Instapaper | Jenn the Geek

delicious 21:49:24
"After looking at my queue tonight I realized that it had gotten a bit out of control. I wanted to know how many articles I had ahead of me. So I wrote a little JS snippet and ran it in firebug. Bam! I had my answer (427 if you’re curious)."

2010-08-25

It's not me, it's you | Phil Gyford’s website

delicious 13:04:17
"I want Today to be good. I want it to be in-depth and have an interesting agenda. I want it to be the radio equivalent of, say, the very best of broadsheet newspapers (not the highest bar in the world, but you’d think it might be possible). Instead, it all too often ends up as the radio equivalent of mid-market tabloids."

2010-08-24

The truth about London cycle hire | Telegraph Blogs

delicious 14:48:47
"Londoners who use Boris Johnson’s cycle hire scheme are 9 to 6 commuters, midday errand runners and fair-weather cyclists who take the tube or bus when it’s a bit cold and wet. How do I know? I’ve seen the data." Although: "I dont think it's a huge stretch of the imagination to say that rain and cold days means less cycling. That's an anecdote and not statistically significant, but this is a blog post based on five day's data, not a scientific paper."

2010-08-23

inadvertent information sharing | I Can Stalk U

delicious 16:01:23
Fetching EXIF data from Twitpic to find the locations of Twitter users, even if they have geolocation turned off. This raises a few questions for me, such as "why don't Twitpic strip (or hide) EXIF". (Personally, I do use Twitter geolocation (although this seems to be rare: most people seem wary of it, for some reason), so if I posted via Twitpic I'd rather they offered to set the metadata from the EXIF location.) I note the site's been there since at least May, so perhaps nobody cares that much.

2010-08-20

Archipelago | URBAGRAM

delicious 21:42:06
"In these maps, activity on the Foursquare network is aggregated onto a grid of ‘walkable’ cells (each one 400×400 meters in size) represented by dots. The size of each dot corresponds to the level of activity in that cell. By this process we can see social centers emerge in each city." "we can show how Paris contains a much more contiguously walkable structure than both New York and London." Interesting (and pretty) stuff.

2010-08-09

Fixing the Bus System | Artsy Techie

delicious 16:27:40
"As a puzzled, stressed and curious newcomer, whether I quickly and fully embrace a system, or whether I avoid it for a long time is an interesting measure of how “usable” the system is." An interesting look at some issues that put people off buses. (I'd say London does reasonably well on these, assuming you speak English, but it's far from perfect even so; nobody knows about the five bus maps you can get, for example.)

London Wall | The Sweeney Forum

delicious 16:21:44
A series of posts containing photographs comparing the London Wall skyscrapers and highwalks as featured in various 1960s movies and TV shows, along with their appearance in 2009 (although it's still largely unchanged). I now need to see Crossplot.

2010-05-08

A First Look At Annotations

notes 21:46:00

A couple of hours after I gave my talk about Flickr machine tags and their possible lessons for Twitter’s new annotations, Raffi Krikorian gave a talk at Warblecamp on that very subject. He’s now posted slides of the talk, which are well worth a look.

In them, he expands on the format for annotations (they consist of types, attributes and values; types can be repeated, but attributes can’t), and mentions an annotations “explorer”, which will contain both “statistics of most used, adopted and trending attributions” and a “wiki page so developers can document their attributes”.

This dual approach pretty much fixes the main points I was worried about, combining a “pave the cowpath” method (looking at actual usage data) with a more editorial take on the wiki.

Anyway, the talk touched on even more (including the beta rollout plan, which will be based on OAuth-enabled apps, rather than feature flags or user lists), and mentioned release dates (which are reassuringly close). All in all, it’s pretty exciting, and I’m looking forward to seeing how they get used in the wild.

Edit: there’s now a video of the talk, thanks to Farhan Rehman.

Annotations and Machines Tags

notes 14:43:31

I’m at Warblecamp (unsurprisingly, they also have a Twitter account), where I gave a short talk about Flickr’s machine tags and possible lessons for Twitter’s upcoming annotations feature. You can download the slides (6MB PDF), but they’re very much from the “big word / big picture” school, so feel free not to bother.

The idea was to breeze through Flickr’s implementation of tags, machine tags, machine tag extras, and exploring hierarchies via both URLs and the API, and point out the features I liked and how, perhaps, Twitter might learn from them.

The discussion afterwards was interesting. One point, which was well worth making, was that Twitter’s stream of text is very different from Flickr’s archive of photographs. (One more difference is that tags (and machine tags) are editable later; annotations are set in stone at post create time.) Aral Balkan suggested a registry of Twitter annotation namespaces, along the lines of his Twitter Formats proposal. Personally, I prefer the “pave the cowpaths” approach of discovering what’s actually in use in the wild (and is also why I built the machine tag browser). I didn’t mention this at the time, but there was an attempt at a Flickr machine tags wiki, which failed, perhaps colouring my view.

There was also a question about size limits for annotations (turns out it’s 512 bytes) and a discussion on the more RDF-ish aspects of triple tags (and how you say what a thing is, which also touched on establishing concordances). Generally I don’t get hung up on the semantics of machine tags, but I’m sure there are people who do, and they might be reassured by the points (mentioned in the Twitter preview post) about the use of schemas:

People could add some agreed upon “meta-annotation” that points to something which *describes* the annotation or annotations that person is using. Think something sort of like XML DTD, though not necessarily machine readable.

For a few slides knocked up the evening before, I’m vaguely happy with both the talk but very happy with the response and the way it’s made me think more about the idea.

2010-01-28

Introducing docent

chaff 18:20:00

Flickr and galleries

It's now a little over four months since Flickr launched their galleries feature. I liked it as soon as I saw it: it's taken a frequent request ("how can I have sets of favourites?") and delivered something that does the same job, but in a different way. I know some people quibble about some of the constraints, but I like the limited number of photos you're allowed, and generally I've enjoyed creating and browsing them.

Unfortunately, there's a problem: discovering other people's galleries. Aaron Straup Cope is good at bookmarking them on delicious, and there's an Explore page, but neither of those necessarily find things I'd like to see.

The gist of it

Just over three weeks ago, Kellan announced the first API support for galleries, and I quickly created a Python script that would go through all my contacts and fetch their galleries. It was useful, and it turned up a lot of galleries I hadn't seen, but it had two big flaws: nobody else would use it, and it wasn't pretty.

App Engines and data models

I've used App Engine in the past, but that was before the advent of their experimental Task Queue API, and I didn't use the datastore. Using Aaron's gae-flickrapp as a core, I spent about a week's worth of evenings on and off learning how to use both, ending up with the core of docent¹, a small web app.

There are only four kinds of object: dbFlickrUser, from gae-flickrapp, which handles logged in users; Contacts, which have a one-to-one relationship with dbFlickrUsers; FlickrUser, which is an object for a user docent knows about but who isn't necessarily logged in; and Gallery, which stores information about the gallery itself.

What it does

When you first log in, a task is added to a high-priority queue to fetch your contacts from Flickr. The NSIDs² from this call are stored in a single ListProperty in the Contacts object, and then a new task is added to a lower-priority queue. This goes through the IDs one by one, fetching the galleries Atom feed³ and creating the relevant objects (if necessary). This, and the various tasks to update galleries for older users, make up the bulk of the CPU load of the app, and almost all of the Datastore writes.

The big difference between traditional ORMs and the way I'm using the App Engine datastore comes into play here. In an ORM such as Django, a dbFlickrUser would have a many-to-many relationship with FlickrUsers, which would then have a one-to-many relationship with Galleries. The former would require a join table between them. The query to fetch all galleries from a single user would look something like galleries = Gallery.objects.filter(owner__contact_of__nsid=nsid)

orm.png

By contrast, in the datastore, Both FlickrUser and Gallery objects have a contact_of ListProperty. As a new user's contact list is examined, their NSID is added to the contact_of list. This is how the pages showing galleries for a contact are built: it's a simple equality test, which is translated behind the scenes to a list-membership test:

galleries = Gallery.all().filter('contact_of =', nsid).fetch(256)

gae.png It took a lot of fiddling to break out of the ORM/SQL mindset, based on joins, but I think I'm happier now I have. On the other hand, keeping the contact_of lists on all the objects in sync is something of an overhead, and the query code isn't significantly easier. There's also a rather severe limitation I only ran across later.

Onto the Flickr blog

This was all well and good as I let a few other people at the site; initially close friends, then via a couple of screenshots on Flickr, before inviting a bit more of a burst of users via Twitter. The site seemed to be scaling fine; there was a lot of CPU used fetching contacts, which eventually I managed to optimise by being more selective about updating from the gallery feeds. In fact, the FlickrUser object is currently pretty much a stub, although I'm thinking of changing that.

However, when docent made the Flickr blog, it hit a serious issue: exploding indexes. The version of the app that was live was doing this query:

galleries = Gallery.all().filter('contact_of =', nsid)
                         .order_by('-published')
                         .fetch(offset, per_page)

That extra "order_by" criteria required an additional index, and because it's combined with a ListProperty (namely contact_of), it hit the problem documented in the Queries and Indexes page:

Custom indexes that refer to multiple properties with multiple values can get very large with only a few values. To completely record such properties, the index table must include a row for every permutation of the values of every property for the index.

When I last looked, docent knew about 14,000 or so galleries. While most had small contact_of lists, some no doubt expanded to dozens of people, and so the index was too large to store. As a workaround, I eventually realised I had to abandon sorting in the query and instead use Python, at which point the app started being responsive again. Lesson learnt, the hard way.

Moving On

So, what now? The app is up, and although there are a few errors still happening, they're mainly in background tasks that can be optimised and retried without any impact on users. Personally, it's been a fairly good, if occasionally intense, introduction to App Engine's unique features.

Would I do things this way again in future? I'm not sure. Turning the relationship model on its head hasn't led to an obvious improvement over the ORM+SQL methods I'd use in, say, Django, and while the Task Queue API is very easy to use, it's hard to develop with (since it has to be fired manually locally) and there are other job queue solutions (such as Delayed Job, for Rails, as used on Heroku). On the other hand, even with the heavy load, and not the best of optimisations, docent almost stayed within the App Engine free quota CPU limits, and didn't approach any of the others.

In any case, I'm happy to have produced something so useful, and hope that anyone who tried using it yesterday only to run into errors feels willing to try again. In the meantime, I'm sure there'll be more scaling roadbumps as the site gains users and more galleries are added, but I'm looking forward to fixing them too. Please, try docent out.

(I know comments aren't enabled on this site at the moment. Feel free to add them on docent's page on Flickr's App Garden.)

¹ Why "docent"? Originlly it was the unwieldy gallery-attendant, but Chris suggested the name, based on a term more common in the US than here for the guide to a museum or gallery.
² NSIDs feel like the primary key for Flickr users: in methods like flickr.people.getInfo, it's one of the key pieces of returned information, and it can be used in feeds to fetch information as well as URLs to photos and profile pages.
³ Using feeds rather than API calls can be handy. For one thing, they don't count against your API queries-per-second count; hopefully they're cached more aggressively, both via the feedparser library and on Flickr's side so they take less resources.
⁴ One nice thing about getting more users is that the likelihood of finding a contact's galleries in the data store already goes up. When I was developing, I had to fetch everything; for the second user, there was some overlap, saving calls. As the site gets bigger, the number of fresh gallery fetches should keep fairly low.
⁵ Since I last wrote about App Engine, it's grown the ability for users to pay for resources beyond the free quota levels. I decided to do this when I hit about 55% of my CPU quota, and the app did indeed reach about 120% yesterday. I don't have a bill yet but I expect it to be under $0.50, which is fine.

2010-01-03

On New Year

notes 21:24:00

I’d forgotten - until yesterday - that the epic post on calendars and blue moons, on the Panic blog, had made me think about doing a post about the changes in New Years. So, before 2010 properly gets going (with most people going back to work tomorrow), I thought I’d try and get this out while it’s still topical.

You’d think the concept of a new year was straightforward. After all, it’s right there: the date is 1/1 (whether you’re European or American), and given we don’t use 0 for dates¹, that’s the first day of the year, right? Well, yes, it is now, for a good chunk of the world’s population. It wasn’t always.

Readers of Pepy’s Diary will know that; indeed, the entry for 1st January 1666/1667 bears two dates. Until the UK changed to the Julian calendar in 1752, the first day of the year was on the Feast of the Annunciation, Lady Day, marking the occasion of Mary’s meeting with the Angel Gabriel. Before then, dates for the first third of the year carry both the date of the Julian and Gregorian year. The British tax year still starts on this date, with (complicated) adjustments for the days lost when the calendar changed.

That’s not the only “new year”, though. Parliamentary years start with the State Opening, in November (or, occasionally) December; the Catholic and Anglican liturgical year also starts in December. Meanwhile the academic year starts after harvest in September. (Australia’s also starts in late summer.) Admittedly, none of those has as much legal force as the calendar or tax year, but still, I thought them worth mentioning.

That’s just in the UK, of course. There are two other obvious major world calendars, both lunar. The Chinese new year (also celebrated in Korea and Vietnam, but not Japan, which swapped to the Western calendar in 1873) is based on a lunar-solar calendar, so it moves around, but not much: it’s defined as the second new moon after the winter solstice, fixing it to a date between 21 January and 21 February (with thanks to this PDF, which did all the sums for me).

Meanwhile, the Hijri calendar, used by Muslims, is a pure lunar calendar, with nothing fixing it to the solar year. As a result, the Islamic new year shifts by either 11 or 12 days a year, moving through the Western calendar every 30 years or so. Even more alarmingly for those used to the rigid certainty of solar reckoning, the first day doesn’t happen until the new moon is officially sighted: this can shift the start of the year back a day, in theory at least. In 2009, the first day of Muharram, the first month, was on 18 December.

I’m not even going to try and explain the various Indian new year’s days, except to note that most of them seem to be around the northern hemisphere spring equinox.

So, happy new year, unless you’re Islamic, in which case, belated happy new year, or Asian, in which case, it’ll soon by new year, unless you’re Japanese, in which case: happy new year.

¹ There’s an exception: astronomers have a year 0, and their convention has been adopted by ISO 8601.

sources

elsewhere

otherwise

flickr

Super Soft

Rossi's

Morris

Rooms

Einstein

Tiles

Cloud

Valley

Lillehammer

Restroom Project

ffffound