Correcting Bad Metadata

Using m3u to populate ID3
Rename from tags
Dreaded 'Various Artists'

Using web services

CD Metadata History
The two OS projects
Webservices::FreeDB
MusicBrainz::Client
Using MusicBrainz
An example

Further topics

Managing Digital Music Metadata with Perl

ID3: How and Why

Digital Music Basics

What is an MP3?

Why metadata?

What format?

ID3: Really useful?

Plenty of applications and devices use ID3 tags, as shown in the slide.

ID3v1 has only seven fields: title, artist, album, year, comment, album track (in 1.1) and genre. The list of genres is fixed and the fields are limited to 30 characters (as the entire data block is a fixed length of 128 bytes).

Second system syndrome

ID3v2 - rewriting it

ID3v2 extends the ID3 standard massively, at the cost of added complexity. The full list of frame headers in ID3v2.4 is as follows:

Audio encryption, Attached picture, Audio seek point index, Comments, Commercial frame, Encryption method registration, Equalisation (2), Event timing codes, General encapsulated object, Group identification registration, Linked information, Music CD identifier, location lookup table, Ownership frame, Private frame, Play counter, Popularimeter, Position synchronisation frame, Recommended buffer size, Relative volume adjustment (2), Reverb, Seek frame, Signature frame, Synchronised lyric/text, Synchronised tempo codes, Album/Movie/Show title, BPM (beats per minute), Composer, Content type, Copyright message, Encoding time, Playlist delay, Original release time, Recording time, Release time, Tagging time, Encoded by, Lyricist/Text writer, File type, Involved people list, Content group description, Title/songname/content description, Subtitle/Description refinement, Initial key, Language(s), Length, Musician credits list, Media type, Mood, Original album/movie/show title, Original filename, Original lyricist(s)/text writer(s), Original artist(s)/performer(s), File owner/licensee, Lead performer(s)/Soloist(s), Band/orchestra/accompaniment, Conductor/performer refinement, Interpreted, remixed, or otherwise modified by, Part of a set, Produced notice, Publisher, Track number/Position in set, Internet radio station name, Internet radio station owner, Album sort order, Performer sort order, Title sort order, (international standard recording code), Software/Hardware and settings used for encoding, Set subtitle, User defined text information frame, Unique file identifier, Terms of use, Unsynchronised lyric/text transcription, Commercial information, Copyright/Legal information, Official audio file webpage, Official artist/performer webpage, Official audio source webpage, Official Internet radio station homepage, Payment, Publishers official webpage, User defined URL link frame

Phew. That's a lot of frames, and although most of them are straightforward, there's a fair few that are quite hard to interpret, let alone write.

All of the frames are variable length. Frames are at the start of the track in v2.2 (which has three-letter frame headers) and v2.3 (which moves to four-letter headers) and at the end (like v1) in v2.4.

Which to use?

Despite the complexity of the v2 spec, you really need to use it, if only because of the 30 character limit.

Perl and ID3

MP3::Info

Reading with MP3::Info:

    #!/usr/local/bin/perl
    use strict; use warnings;
    
    use MP3::Info;
    my $file = "example.mp3";
    my $tag  = get_mp3tag($file);
    foreach my $field qw(ARTIST ALBUM TITLE TRACKNUM) {
      print ucfirst( lc $field ). " is " . $tag->{$field} . "\n";
    }

Writing with MP3::Info:

    #!/usr/local/bin/perl
    use strict; use warnings;
    
    use MP3::Info;
    my $file = "example.mp3";
    my $tag  = get_mp3tag($file);
    
    $tag->{GENRE}  = "Rock";
    $tag->{ARTIST} = "Coldplay";
    set_mp3tag($file, $tag);

MP3::Tag and MP3::ID3Lib

Reading with MP3::Tag:

    #!/usr/local/bin/perl
    use strict; use warnings;
    
    use MP3::Tag;
    my $file = "example.mp3";
    
    my $mp3 = MP3::Tag->new($file);
    my ($title, $tracknum, $artist, $album) = $mp3->autoinfo();
    
    print "Artist is $artist\n";
    print "Album is $album\n";
    print "Title is $title\n";
    print "Tracknum is $tracknum\n";
  
Writing with MP3::Tag:

    #!/usr/local/bin/perl
    use strict; use warnings;
    
    use MP3::Tag;
    my $file = "example.mp3";
    
    my $mp3 = MP3::Tag->new($file);
    my $tag = $mp3->get_tags();
    if exists $mp3->{ID3v2} {
      my $id3 = $mp3->{ID3v2};
      $id3->change_frame("TCON","Rock");
      $id3->write_tag();
    } else {
      warn "No existing ID3v2 tag found\n";
    
      # create tag with:
      # my $id3 = $mp3->new_tag("ID3v2");
    }

You'll note it's a little more involved.

Merge the two: MP3::Info::set_mp3v2tag

    #!/usr/local/bin/perl
    use strict; use warnings;
    
    use MP3::Info;
    use MP3::Info::set_mp3v2tag;
    my $file = "example.mp3";
    my $tag  = get_mp3tag($file);
    
    $tag->{GENRE}  = "Rock";
    $tag->{ARTIST} = "Coldplay";
    set_mp3v2tag($file, $tag);

All the v2 goodness of MP3::Tag with all the simplicity of MP3::Info!

Correcting Bad Metadata

Using m3u to populate ID3

Fairly simple script: auricom:Users:blech:Desktop:id3talk:code:m3u_to_id3.pl

Notable excerpts:

    my @m3us = File::Find::Rule->name( '*.m3u' )->in( @ARGV );
    my @mp3s = File::Find::Rule->name( '*.mp3' )->in( @ARGV );

I use File::Find::Rule to find the m3u and mp3 files under the directories specified on the command line. I find the code is much cleaner than using File::Find's callbacks.

    my $parser = MP3::M3U::Parser->new(-type => 'file',
                                       -path => $dir,
                                       -file => $file);
    my %results = $parser->parse;

MP3::M3U::Parser has a fairly odd constructor, but it does work. However, it returns a data structure that I didn't find particularly useful:

    $VAR1 = {
              'Coldplay - Parachutes' => [
                                           [
                                             'Coldplay - Shiver',
                                             302,
                                             'Coldplay - Shiver.mp3'
                                           ],
                                      (...)
                                         ],
             }
            
so I do a little bit of manipulation to change it, in the subroutine 'rejig':
    
    $VAR1 = {
              'Coldplay - Shiver.mp3' => {
                                           'album' => 'Parachutes',
                                           'artist' => 'Coldplay',
                                           'length' => 302,
                                           'number' => 2,
                                           'track' => 'Shiver',
                                           'name' => 'Coldplay - Shiver',
                                           'file' => 'Coldplay - Shiver.mp3'
                                         },
              (...)
            }

This has also done some manipulations based on my knowledge of the format that the source m3u/mp3 files have, splitting the artist out from the album and track names. The fact it's now a hash keyed on the filename means it's easy to look the filename up and tag based upon it:

    foreach my $mp3 (@mp3) {
      my ($vol, $path, $file) = File::Spec->splitpath( $mp3 );
      if (exists $results->{$file}) {
        tag_file($mp3, $results->{$file});
      }
    }

The tag_file subroutine is really quite simple:

    sub tag_file {
      my ($file, $results) = @_;
      
      set_mp3v2tag($file, $results->{track}, $results->{artist},
                  $results->{album}, '', '', '', $results->{number}); 
      warn "Set tag for $file\n";
    }

That's all there is to it. This script does have a few flaws: it's possible to confuse it if the m3u file is incorrect (for example, listing 'Coldplay - Don't Panic' when the file is 'Coldplay - Dont Panic', but then, any app that reads m3u files will also fail there.

It's a good way of tagging a complete album if you have it.

Rename from tags

auricom:Users:blech:Desktop:id3talk:code:number_from_tags.pl

This is basically going the other way: taking ID3 information and applying it back to the filesystem name. This can actually be a sensible second step; in the demo with the talk, I first take information from m3u and apply it to ID3, then take the ID3 information to manipulate the filename. (This does break the m3u, but generating a new one from the ID3 tags - or even the filenames, with LS - is a fairly trivial task.)

Some chunks of code:

    foreach my $mp3 (@mp3s) {
      my $tag  = get_mp3tag($mp3);
      warn "No tag in '$mp3'" and next unless $tag;

Again, I've used File::Find::Rule to compile a list of mp3s. I skip out of the loop if there's no tag (since I don't want to write useless data).

      my $new = get_new_from_tag($tag, $mp3);
    
      unless (-e $new) {
        rename ($mp3, $new) or warn "Couldn't rename $mp3:\n $!\n" and next;
        print "renamed $mp3\n     to $new\n";
      }
    }

I then get a new filename from the get_new_from_tag subroutine, and apply it, unless the file already exists. (Clobbering existing files is bad, kids.) I also spit out some information about what I'm doing, if the rename is successful.

Let's take a look at the get_new_from_tag sub:

    my ($volume, $dir, $file) = File::Spec->splitpath($mp3);

I use the File::Spec manpage to split the filename apart.

    my ($tracknum) = split(/\//, $tag->{TRACKNUM});

ID3v2 can use the track number field to store both the number of the track and the total number of tracks on the album, and this is delimited with a slash (for example, 8/11 is the eighth track of eleven). I don't want that total in the filename, so I split it out.


    my $target = 23; # length limit of target filesystem, minus eight 
                     # (##- $name.mp3)


    my $new = lc($tag->{TITLE});
    $new = shorten_filename($new, $target);

I use an operating system with a 31 character filename limit, so I like to keep the filenames short. I'm also a fan of lower-case filenames. This makes sure the filename is shorter. (The curious can inspect the shorten routine in the full script.)

    $new = sprintf("%02d%s", $tracknum, "- $new.mp3");
                return File::Spec->catfile($dir, $new);

This puts the filename back together, and returns the reassembled path.

Dreaded 'Various Artists'

auricom:Users:blech:Desktop:id3talk:code:split_artist_track.pl

This is the first ID3 only script, and it's really simple because of it. As the slides note, FreeDB and other tagging services (and the applications that use them) often leave various artists compilations with 'Various Artists' as the artist, and the title field with 'Title / Artist' or 'Artist / Title' instead. This is less than useful.

It's pretty straightforward to fix, though.

    my @mp3s = File::Find::Rule::MP3Info->file()
                                        ->mp3info( TITLE => qr! ?- ! )
                                        ->in( $path );

Instead of plain File::Find::Rule, this uses the MP3Info extension to mean that I only get mp3s whose title matches the regular expression sepcified, which should mean that I only see those which have the broken ID3 tags. It is, however, a little slow, as it has to look at each mp3.

    foreach my $mp3 (@mp3s) {
      my $tag   = get_mp3tag($mp3);
      my ($artist, $title) = split(/ ?- /, $tag->{TITLE}, 2);

I load the tag, then split the existing title tag into the artist and album.


      $tag->{TITLE}  = $title;
      $tag->{ARTIST} = $artist;

I clobber the existing title and artist tags in the hash of tag information...


      print "$mp3\n title  '$title'\n artist '$artist'\n";
      set_mp3v2tag($mp3, $tag);
    }

... and then write it out after printing some informative text.

Of course, if the title and artist are the other way around, or delimited with / not -, then you'll have to edit this script.

Using web services

CD Metadata History

In the beginning was the CDDB: The CDDB was a wonderful thing. You put in a CD, and most of the time CDDB recognised it and sent you back the titles. When you did have to type in the CD track listing, you could at least know that someone else would find it useful later.
Gracenote 'closed' the DB: In what's a remarkably underdocumented closing of a common resource, Gracenote made access to the downloadable CDDB nearly impossible, before rewriting the (admittedly flawed) CDDBv1 protocol and making licences for the v2 protocol prohibitively expensive.
GPL data to that point still free
Used by two open source projects: There's still a glimmer of hope for open CD metadata, though.

The two OS projects

FreeDB: FreeDB took the existing, open CDDB database, protocol and code and set up a parallel, free operation. They've not really done much other than coast since then (although there is a minimal web search), but it's still a large and useful resource.
MusicBrainz: MusicBrainz, on the other hand, was much more ambitious; as the description says, ``MusicBrainz is a community music metadatabase that attempts to create a comprehensive music information site.''; MusicBrainz uses RDF, PostgresQL and Perl to run their server, and offers a C client with Perl (amongst other) wrappers to access a RESTful web service, in addition to a rich, linked web search. There's data gardening (to make sure people don't put too much rubbish into the database) and a track fingerprinting technology - TRM - to identify tracks with no metadata whatsoever.

TODO

Further topics

Left as a list for the reader.

Controlling iTunes with Perl
Similarly for XMMS, mpg123
MusicBrainz without the API: As mentioned beforehand, it looks possible to handle the RDF directly, rather than using the C library.
Submitting via FreeDB module: FreeDB isn't a very good protocol, but there is Perl support for reading a CD TOC and submitting the data. There's also a POE wrapper for encoding mp3s.
Apache::MP3: Serves a collection of mp3s, using Apache and a modified directory browser.
MP3 as extension
... and more: There are modules still out there to be written. Maybe I'll get round to it now I've finished these notes...