You’re colour-grading a single photo. You move twenty sliders, nudge the white balance, crop, bounce to the next frame, come back, keep tweaking. At some point Lightroom saves the catalog. The catalog is 40 GB. You’ve changed the colour of, let’s say, one photo.
A naive backup tool uploads 40 GB. Tomorrow, when you tweak a second photo, it uploads 40 GB again. By Friday you’ve uploaded 200 GB to back up what is, in reality, a few hundred kilobytes of changed pixels and a handful of modified database rows.
This post is about why that doesn’t have to happen, and why it doesn’t happen with macup.
The problem with whole-file backup
A lot of backup tools still operate at the granularity of the file. They look at a file’s modification time, decide whether it changed since last run, and if it did, they send the whole thing.
That’s fine for a 200 KB Word document. It’s a disaster for the files a Mac creative actually lives in.
- A Lightroom catalog: 40 GB, rewritten on most saves.
- A Final Cut Pro library: 180 GB of package, re-touched on every session.
- A Logic Pro session with a handful of sample libraries: 12 GB, mixed project data and render cache, all of it in one bundle the Finder reports as a single “file.”
- A Photos library: 600 GB, with a SQLite-backed index that gets rewritten constantly behind the scenes.
A whole-file backup tool sees each of these as one object and happily re-uploads them in full on every change. Bandwidth dies. Your cloud quota fills in a month. Snapshots that were supposed to be cheap get expensive fast.
What deduplication actually does
Deduplication, done properly, breaks that file-level thinking.
Instead of treating a catalog as one 40 GB blob, the engine splits it into smaller pieces, typically a few megabytes each, and hashes every piece. When it goes to upload, it first asks the repository: “do you already have a piece with this hash?” If yes, it points to the existing one and skips the upload. If no, it uploads the piece.
When 98% of your catalog hasn’t changed bit-for-bit since yesterday, 98% of its pieces match ones the repository already has. Only the pieces that genuinely contain new bytes get sent.
The practical math: a 40 GB catalog with a hundred slider adjustments across ten photos changes somewhere between 4 and 8 megabytes of actual on-disk bytes. A dedup-aware engine uploads 4 to 8 megabytes. Not 40 gigabytes.
Why content-defined matters
There’s a subtler problem, though, and it’s the one that separates backup tools that sort-of work on Lightroom from backup tools that actually do.
The simple version of chunking is to cut the file every N megabytes, fixed size. That works until a byte gets inserted near the start of the file. Now every chunk after the insertion point is shifted by one byte, which means every chunk has a new hash, which means every chunk needs to be re-uploaded. You’ve saved nothing.
Lightroom rewrites metadata constantly. Photos does it. Logic does it. Final Cut does it. A fixed-size chunker falls apart on these apps almost immediately.
Content-defined chunking fixes this. Instead of cutting at fixed offsets, the engine slides a small rolling hash across the file and cuts at byte positions whose hash hits a pattern. These positions are properties of the content itself. If a byte gets inserted, the cut points on either side of the insertion stay in the same content-relative place. Only one or two chunks around the change get invalidated. The rest of the file continues to dedup cleanly against yesterday’s backup.
This is why dedup works on Mac creative apps at all. Without content-defined boundaries, Lightroom saves would look like full rewrites to the backup engine, and you’d be no better off than if you’d had no dedup in the first place.
What it does to your storage bill
Consider a working photographer. 4 TB of active RAW, a 1.2 GB catalog, shooting two or three times a week, culling and editing steadily. They run a backup every hour.
A naive tool, checking the catalog hourly and finding it changed, would send 40 GB hourly. Over a year of hourly snapshots, that’s a theoretical 350 TB of uploads for the catalog alone. In practice you’d hit the quota wall in week two.
A dedup-aware tool sees the same shoot, the same edits, and ends the year with roughly 4.4 TB stored at the destination. Not 350. Not 40. 4.4.
That’s the order-of-magnitude difference. That’s why backup of creative work at Mac-scale is possible at all. And it’s why we let you run the numbers for your own archive in the cost calculator rather than promising you vague “savings.”
What dedup doesn’t do
A few honest caveats.
- It doesn’t help if your application encrypts its own files with a different key per save. (Lightroom doesn’t do this. Neither does Logic, Final Cut, Photos, or any app a typical Mac creative uses day-to-day.)
- It doesn’t help if your application re-encodes every photo on save. (Lightroom edits non-destructively. Your RAW files aren’t being rewritten.)
- It doesn’t magically dedup across completely separate backup accounts. macup dedups within a single repository. Two Macs backing up the same photo folder to the same repository share blocks. Two people on different accounts don’t share anything, which is what you want.
The practical advice
If you’ve been excluding your Lightroom catalog from backup because you were told it was “too big to back up” or “changes too often,” stop. Both things are true and neither of them matters. Catalogs are tractable. Back them up.
What you should exclude is the preview cache and Smart Preview store. Those regenerate on demand, churn fast, and aren’t worth the dedup work. You lose nothing by excluding them and you save a meaningful amount of churn.
And back up your catalog to the same destination as your source RAWs. Dedup works on both. Putting them together lets the engine share blocks between originals and catalog-referenced exports, which is where some of the largest savings quietly show up.
One more thing
The reason your backup works at all, quietly, every hour, without melting your upload, is chunking. It’s one of a half-dozen boring algorithms doing quiet work under the surface. You shouldn’t have to think about it. Now you’ve thought about it. Feel free to stop.
If you want the textbook entries, there’s chunking and deduplication in the glossary. If you want to see the numbers against your own archive, the cost calculator is linked above. Otherwise, go back to your catalog.