Baroque Works

In the beginning was Ti Kan, who wrote a little CD player app for X windows called xmcd. Like some other players at the time, it had support for entering disc and track names, and remembering them later. Ti went a step further, though; he provided support in the application to submit track names to a central server, the CD Database, or CDDB. Users could download and install the entire CDDB on their hard drive, which would then allow them to magically get track and disc names for discs that anyone else had entered data for. Later, Ti added support to look up track names on the internet. I was an early xmcd user (in fact, I even distributed a binary version for BSD/OS, a fact that I’d completely forgotten until I googled for it.)

Eventually, Ti looked for a way to monetize the CDDB. I don’t blame him. The database eventually ended up in the hands of Gracenote, where it remains to this day. Gracenote licenses an SDK and access to their CDDB to companies that want to include it in their product.

Many computer MP3 players that have support for ripping CDs also support looking up CD track names in the CD Database.

There are two interesting things about the CDDB (and really I mean “the CDDBs”, since there are databases other than Gracenote’s): first, the data in them is provided by the users, rather than by the music publishers, and second, the database is full of errors. In the early days of the CDDB, I definitely got a thrill of pleasure in buying a new disc, putting it in the drive, and discovering that I was the first CDDB user to do so. Cool! I’d get to “contribute to the community” by entering disc and track information. As time went on, this of course became a rarer and rarer experience. One aspect of the database that I don’t know much about is: what happens if two people send in different track data for the same disc? Which one wins? Many clients allow you to “submit corrections to the CDDB,” but it’s not clear that anything actually ever happens with these. So of course, there are errors. The information is being provided by end users. We make mistakes.

Now, it’s not surprising that there are some errors in the CDDB. There would be errors, albeit perhaps to a lesser degree, even if the publishers were providing the data. But if you listen to a lot of classical music, the experience becomes frustrating on a whole new level. The number of errors in disc and track data are legion compared to what you find in more popular genres, like rock. And the type of errors you see are infinitely more egregious.

There are a few reasons for this.

For example, I just ripped the Choeur des Musiciens du Louvre version of Offenbach’s _La Belle Helene _to iTunes. The first disc, amazingly, only had one problem (listing the “year” as 1864, the year of the composition, rather than 2001, the year of recording). To only have one error is actually a rare enough occurrence that I’m considering declaring today a National Holiday and commemorating it each year. The second disc had the following problems: The title of the opera was wrong. Apart from not having the accents, the title of the opera includes “Disc 2” in it, even though there’s a “disc number” field in the ID3 tags. There’s no Composer tag. The “this is part of a compilation” field is checked. And finally, the piece de resistance is that every song on the disc is called simply “Act II” or “Act III” and the “artist” field is used for the actual title of the track. That’s going to make “browse by artist” on my iPod super useful, since it will let me find every song by that superb band, “Ciel! Mon Mari!”

Truly, the mind reels.

Now, all of these issues are correctable in my music player, of course. I can just edit the hideously wrong CDDB data after the fact. But since the whole idea of keeping a centralized database of disk-to-metadata mappings is to _free us from having to do that,_I find it somewhat frustrating. (“Use a different player that uses a non-Gracenote CDDB” isn’t a solution, both because I like everything else about my music player and because I haven’t seen any evidence that the non-Gracenote CDDBs are free from this issue.)

I’m not sure what the right solution is, but the marginal cost for paying an intern to send the right data to one or more of the various CDDBs can’t be more than a few bucks per project.

What do you think the right answer is?