Find similar (but not identical) files on Mac
Some tools look for 'similar' files. Here's why exact-match dedupe is usually safer, and how to handle near-duplicates.
There’s a category of dedupe tools that promise to find “similar” files — photos that look alike, documents with similar content, music files with the same audio but different bitrates. These can be useful, but they come with a real trade-off: false positives. The tool’s “similar” might be your “two photos I want to keep both of.”
Exact match vs similarity matching
Two different problems:
- Exact duplicates: the same file in two places. Byte-for-byte identical. Always safe to dedupe if you keep one copy.
- Similar files: photos taken seconds apart, documents with minor edits, audio in different formats. Visually or aurally similar but not the same file.
Tools that find similar files use heuristics: perceptual hashes for images, fuzzy text matching for documents, audio fingerprinting for music. These work, but they make judgment calls that you may not agree with.
The case for exact-match dedupe
- Install Dupe and open it.
- Click “Add Folder” and add the folders you want scanned.
- Click “Scan.” SHA-256 hashes each file; identical hash = identical content = duplicate.
- Review and trash. There are no false positives — if Dupe says two files are duplicates, they’re literally the same bytes.
This gets you 80% of the way for most people. It’s the safer first pass, especially with photos. Dupe doesn’t try to guess.
When you actually need similarity matching
Sometimes you do want to find “similar” files. For example:
- Burst photos where you want to keep just the best one.
- 10 versions of a draft, each saved as a slightly different filename, where you only want the latest.
- The same MP3 at 192 kbps and 320 kbps and you want to keep just the higher-bitrate copy.
For these cases, exact-match dedupe won’t help — those files have different bytes. You’d need a perceptual or fuzzy-matching tool, and you’d want to be ready to review every match by hand, because the false-positive rate is much higher.
A reasonable workflow:
- Run Dupe first to clear out byte-identical duplicates safely.
- For burst photos, use Photos.app’s built-in Bursts album to pick favorites and clean up the rest.
- For music, sort by song name and manually compare bitrates in Music.app.
- For document drafts, sort by modification date and judge by eye.
Safety with exact-match
- Trash-only deletion, 30-day recovery.
- No guessing, no false positives.
- System paths, hidden folders, and app data are excluded.
The honest answer is that exact-match dedupe is the safest cleanup. Similarity matching is sometimes useful, but it needs careful manual review every time.
More Dupe tips
-
Apple Photos Duplicates album — what it catches and what it misses
The Photos app Duplicates album is handy, but it has real limits. Here's what it finds, what it doesn't, and how to fill the gaps.
-
Clean up leftover files from uninstalled apps on Mac
Dragging an app to the Trash doesn't remove all its data. Here's where the leftovers live and how to clean them.
-
Clean up your Mac without buying a cleaner app
Most paid cleaner apps do things macOS already does. Here's a free, manual workflow that's just as effective.
-
A no-bullshit guide to cleaning up your Mac's disk
Skip the SEO bait and the sketchy cleaner apps. Here's what actually works to reclaim disk space on a Mac.