Removing duplicate tracks from iTunes with Ruby and RBOSA
When I put a new hard drive in my computer, I decided to reinstall the operating system and install applications and data from scratch. Unfortunately, I had a small mishap and accidentally imported two copies of my iTunes library. Removing duplicates by hand would have been possible, but it would have been tedious as well. Mercifully, I stumbled on to RBOSA, so I was able to let the computer do it.
RBOSA is basically Applescript for people who never got around to learning Applescript. The interface to things like iTunes is very simple, so it didn’t really take a lot of work to get something to find duplicates up and running.
The strategy I used was to look at songs in the main library (the method I used for finding the “main library” looks kind of suspect, but it worked. Use caution if you try this at home) and put all duplicates in to a new playlist. Once they were there, I was able to check them over to make sure that they were dups and delete them.
Now, if you’re playing the home game and you know the secret trick for finding and deleting large groups of duplicates (around 8,500 tracks in this case) without busting out the programming: please tell me. I’m pretty sure that I’ll need to do this again at some point, and I’m all about doing things the easy way.
Follows is the script. I used Ruby 1.8.6 and RubyOSA 0.3.0.1 (installed via gem.)
require 'rubygems'
require 'rbosa'
itunes = OSA.app 'iTunes'
dups = itunes.make OSA::ITunes::Playlist
dups.name = 'Duplicate Tracks'
class OSA::ITunes::Track
def eql?(o)
artist == o.artist &&
album == o.album &&
track_number == o.track_number &&
name == o.name &&
time == o.time
end
def hash
to_s.hash
end
def to_s
"#{artist}/#{album}/#{track_number}/#{name}/#{time}"
end
end
seen = Hash.new
itunes.sources[0].playlists[0].tracks.each do |track|
seen[track] ||= Array.new
seen[track] << track
end
seen.values.each do |tracks|
if 1 < tracks.length
# Keep the file with the largest bitrate.
tracks = tracks.sort { |a,b| b.bit_rate <=> a.bit_rate }
keep, rest = tracks[0], tracks[1..-1]
rest.each { |t| t.duplicate dups }
end
end
March 24th, 2007 at 3:49 pm
I would have sworn that I had seen a “remove duplicates” function in iTunes. … aha, OK, I think what I was was the View option “View Duplicates.”
March 24th, 2007 at 3:50 pm
Yeah, “View Duplicates” is definitely there. If there were only a few dozen, that would definitely get the job done.
March 27th, 2007 at 9:21 am
Oh, and “View Duplicates” is sort of useless, as it appears to be going off only title and artist. I have two versions of, for example, George Clinton’s “Atomic Dog.” One clocks in at 2:45, the other at 3:xx. Clearly they’re variant recordings, but iTunes considers them both duplicates.
March 27th, 2007 at 9:40 am
Yeah, that always bugged me. Worse is that neither iTunes’ “View Duplicates” nor the one that I wrote does anything smart like checking for track or album name similarities with short edit distances or matching soundex codes or whatever. I’ve had a few cases where both Stephanie and I imported the same album and they ended up with slightly different names — different punctuation or capitalization, etc — so they don’t show up as “duplicates.”
Hmm. None of this would be hard to do. Perhaps I should improve upon this script tonight.
March 27th, 2007 at 1:55 pm
I’ve had a few cases where both Stephanie and I imported the same album and they ended up with slightly different names — different punctuation or capitalization, etc — so they don’t show up as “duplicates.”
Totally lame.