Posted inInformation Technology

FDupes

We are all pack rats. “Safety copies” is how we justify duplicate files. Duplicate files used to require we kept stacks of floppy disks, remember those?

Sony used to sell shrink wrapped bricks of 100 that I used to go through every few weeks. They were much nicer than the bails of 5 1/4 disks. How many of you had one of those big wood 3-drawer floppy storage cabinets for 5 1/4 floppies? I did. Each drawer held one week’s backup. An 80MEG hard drive took about 30 floppies to back up. The stuff you really needed you kept duplicate files of on different floppies “in a safe place.”

wood floppy storage
wood floppy storage for 3.5 inch floppies

At least when it came to the 3.5 inch floppies I got down to two single storage units. I still use these for LS-120 disks when I’m writing a new book. You can’t put a label on a thumb drive.

Today

Today we have NAS storage and 16TB is nothing. I have a drawer full of thumb drives with no idea what is on most of them. Once they get used for a project or client “they get put in a safe place” because you can’t really get a sticky note to stay on them. When I get home from a long term project I immediately upload directories from my laptop and the thumb drives I took with me. Much of it is document files that I already have on at least three machines, but something might have changed. I wouldn’t want to lose any of the new files I created!

Eventually the day comes when you are trapped in the office without anything pressing to do. You notice that your NAS is showing it is more than 80% full and you scream in your head “Enough!

Oh, the dread of having to manually compare this stuff. Isn’t there a tool?

Enter fdupes

Be careful! Don’t just let this thing nuke whatever it considers a “dupe.”

Why?

Do you make point-in-time snapshots of working code or documents? Do you need to keep them for FDA or other regulatory reasons? Does every pimp send you the signed contract as “contract.pdf?” There are a host of reasons to still have some duplicate files. After you install this wonderful tool you need to generate a list so you can judiciously purge.

fdupes --recurse --size --time --order='name' directory-path

You substitute the root of the directory tree you want searched. This will get you a nice list of files. Yes, it will take a bit. This performs more than a file name comparison. You still shouldn’t just blindly trust and let it delete. You might want to direct it to a file because the list can be very long.

Duplicate files snippet

Yes, I deliberately have duplicates. They are backups of my tax database. I take various snapshots at various points in time. There are three things in this life that I fear having a problem with. Those would be “the I,” “the R,” and “the S.” Just remember that Al Capone went to prison for taxes, not being a gangster that had people killed.

You will notice each block is grouped by the file size. You get to see the file timestamp and full path of said file. If you run fdupes in this manner, you can make the decision to delete. Sometimes you want to save the stuff. Other times you want to blast an entire directory instead of individual files.

Looks like Satan was messing with me here!

Yikes!

Roland Hughes started his IT career in the early 1980s. He quickly became a consultant and president of Logikal Solutions, a software consulting firm specializing in OpenVMS application and C++/Qt touchscreen/embedded Linux development. Early in his career he became involved in what is now called cross platform development. Given the dearth of useful books on the subject he ventured into the world of professional author in 1995 writing the first of the "Zinc It!" book series for John Gordon Burke Publisher, Inc.

A decade later he released a massive (nearly 800 pages) tome "The Minimum You Need to Know to Be an OpenVMS Application Developer" which tried to encapsulate the essential skills gained over what was nearly a 20 year career at that point. From there "The Minimum You Need to Know" book series was born.

Three years later he wrote his first novel "Infinite Exposure" which got much notice from people involved in the banking and financial security worlds. Some of the attacks predicted in that book have since come to pass. While it was not originally intended to be a trilogy, it became the first book of "The Earth That Was" trilogy:
Infinite Exposure
Lesedi - The Greatest Lie Ever Told
John Smith - Last Known Survivor of the Microsoft Wars

When he is not consulting Roland Hughes posts about technology and sometimes politics on his blog. He also has regularly scheduled Sunday posts appearing on the Interesting Authors blog.