Dupe guide

Find similar (but not identical) files on Mac

Some tools look for 'similar' files. Here's why exact-match dedupe is usually safer, and how to handle near-duplicates.

4 min read

There’s a category of dedupe tools that promise to find “similar” files — photos that look alike, documents with similar content, music files with the same audio but different bitrates. These can be useful, but they come with a real trade-off: false positives. The tool’s “similar” might be your “two photos I want to keep both of.”

Exact match vs similarity matching

Two different problems:

Tools that find similar files use heuristics: perceptual hashes for images, fuzzy text matching for documents, audio fingerprinting for music. These work, but they make judgment calls that you may not agree with.

The case for exact-match dedupe

  1. Install Dupe and open it.
  2. Click “Add Folder” and add the folders you want scanned.
  3. Click “Scan.” SHA-256 hashes each file; identical hash = identical content = duplicate.
  4. Review and trash. There are no false positives — if Dupe says two files are duplicates, they’re literally the same bytes.

This gets you 80% of the way for most people. It’s the safer first pass, especially with photos. Dupe doesn’t try to guess.

When you actually need similarity matching

Sometimes you do want to find “similar” files. For example:

For these cases, exact-match dedupe won’t help — those files have different bytes. You’d need a perceptual or fuzzy-matching tool, and you’d want to be ready to review every match by hand, because the false-positive rate is much higher.

A reasonable workflow:

  1. Run Dupe first to clear out byte-identical duplicates safely.
  2. For burst photos, use Photos.app’s built-in Bursts album to pick favorites and clean up the rest.
  3. For music, sort by song name and manually compare bitrates in Music.app.
  4. For document drafts, sort by modification date and judge by eye.

Safety with exact-match

The honest answer is that exact-match dedupe is the safest cleanup. Similarity matching is sometimes useful, but it needs careful manual review every time.

← All Dupe tips