I Wrote a Tool to Tame 15 Years of Photos — Here's How It Works
Table of Contents
- The Stack
- A Word on Pillow
- Pass 1: Hash and Deduplicate
- The EXIF Orientation Problem
- Why pHash?
- How the Discrete Cosine Transform Makes Hashing Work
- Choosing the Right Threshold
- Grouping Strategy
- Scene Classification: On-Device Neural Networks via Apple Vision
- The Swift Bridge
- Why Not Use a Python ML Model Instead?
- Pass 2: Downscale, Convert, and Save
- Step 1: Full Load and EXIF Transpose
- Step 2: Downscale with Lanczos Resampling
- Step 3: Convert to HEIC
- Step 4: Filename Generation
- Deep Dive: HEIC — The Format That Replaced JPEG (On Apple, At Least)
- JPEG: Thirty Years of Dominance
- The Long Search for a Successor
- Apple’s Gambit
- Why HEIC is Technically Superior
- The Patent Problem
- The Compatibility Landscape
- HEIC in Python
- Deep Dive: uv — How a Rust Rewrite Changed Python Packaging
- The Problem uv Solved
- Enter Astral
- uv in This Project
- More Code: The Pieces That Tie It Together
- EXIF Date Extraction with Fallback
- Collision-Safe Output Naming
- Results
I recently stared down a folder containing 5,700 photos spanning over fifteen years — iPhone snapshots, DSLR exports, random screenshots, duplicates from cloud sync conflicts, and images forwarded through half a dozen messaging apps. Some were 24-megapixel RAW conversions. Some were 640x480 relics from 2009. Many were duplicates, or near-duplicates, taken in burst mode or re-saved at slightly different quality levels. Some existed three times over: once from the original camera roll, once from a WhatsApp forward, and once from an iCloud sync that silently created a copy with a different filename.
I didn’t want to spend a weekend manually sorting through them. I wanted a pipeline that could:
- Deduplicate images — not by filename or byte-for-byte comparison, but perceptually. A JPEG and a slightly-cropped HEIC of the same sunset should collapse into one.
- Normalize everything to a single format — modern, efficient, and lossless-capable.
- Downscale to a sane max resolution — I don’t need 6000x4000 pixels for photos I’m archiving, not printing.
- Classify scenes — automatically tag what’s in each photo (landscape, food, document, etc.) without uploading anything to a cloud API.
- Rename consistently — machine-sortable filenames derived from EXIF timestamps, resolution, and scene label.
So I built batch-image-processor, a single-file Python CLI backed by a 20-line Swift helper. No database, no config file, no daemon. Just a two-pass pipeline that reads a folder and writes a clean, deduplicated, tagged archive.
Here’s how it works under the hood.
The Stack
| Layer | Library / Tool | Role |
|---|---|---|
| Image I/O | Pillow 12.2+ | Open, resize, transpose, save images |
| HEIC support | pillow-heif 1.3+ | Register HEIF/HEIC codec with Pillow via register_heif_opener() |
| Perceptual hashing | imagehash 4.3+ | Generate pHash fingerprints for deduplication |
| Scene classification | Apple Vision framework via VNClassifyImageRequest | On-device neural image classification (macOS only) |
| Runtime | Python 3.14+, managed by uv | Fast dependency resolution and execution |
No GPU required. No cloud calls. Everything runs locally on a Mac.
A Word on Pillow
If you’ve done any image work in Python, you’ve used Pillow — but you might not know its origin story. The original library was called PIL (Python Imaging Library), created by Fredrik Lundh in 1995. PIL was the image library for Python for over a decade, but development stalled after 2009 with the 1.1.7 release. It never gained Python 3 support, and it was never uploaded to PyPI — you had to install it from a tarball.
In 2010, Alex Clark forked PIL as Pillow, initially just to get it onto PyPI and add Python 3 compatibility. Over time, Pillow evolved far beyond the original: it added support for new formats, improved performance, and became the officially recommended replacement. Today, pip install Pillow is one of the most-downloaded packages on PyPI, with over 100 million downloads per month. When you import PIL in modern Python, you’re actually importing Pillow — it installs itself under the PIL namespace for backwards compatibility.
For this project, Pillow handles the core image pipeline: opening files in any supported format, reading EXIF metadata, applying orientation transforms, resizing with high-quality resampling filters, and saving to HEIF (via the pillow-heif plugin). It’s the Swiss Army knife that makes everything else possible.
Pass 1: Hash and Deduplicate
The first pass never fully loads any image into memory at display resolution. Instead, for each file, it:
-
Opens the image and applies EXIF transpose via
ImageOps.exif_transpose(). -
Generates a 512x512 thumbnail (not a resize —
Image.thumbnail()preserves aspect ratio and is much faster than a full decode followed byresize()). -
Computes a perceptual hash (pHash) using
imagehash.phash().
HASH_THUMBNAIL_SIZE = (512, 512)
def hash_image(path: Path) -> imagehash.ImageHash:
with Image.open(path) as img:
img = ImageOps.exif_transpose(img)
img.thumbnail(HASH_THUMBNAIL_SIZE)
return imagehash.phash(img)
The with block ensures the image file handle is closed immediately after hashing — important when you’re iterating over thousands of files and don’t want to leak file descriptors.
The EXIF Orientation Problem
Step 1 deserves more explanation, because it addresses one of the most persistent headaches in image processing. When you hold your phone sideways and take a photo, the camera sensor doesn’t rotate — it always captures in the same physical orientation. Instead, the phone writes an EXIF orientation tag (tag 0x0112) into the file’s metadata, telling viewers “rotate this 90 degrees clockwise before displaying.” The image data on disk is still landscape; the rotation is just a hint.
This creates a problem for perceptual hashing. If the same photo exists as two files — one with the raw sensor orientation plus an EXIF tag, and one where a tool has already physically rotated the pixels and stripped the tag — their pixel data is completely different. One is a landscape buffer, the other is portrait. Without EXIF transpose, they’d produce unrelated hashes and never be recognized as duplicates.
ImageOps.exif_transpose() solves this by reading the orientation tag and physically rotating/flipping the pixel data to match, then removing the tag. After this call, the pixel buffer is always in “display” orientation regardless of how the file was originally saved. This is essential for correct hashing, and it’s a step that a surprising number of image tools skip.
There are eight possible EXIF orientation values (identity, plus rotations and mirror flips), and exif_transpose handles all of them. The most common in practice are 1 (normal), 3 (rotated 180), 6 (rotated 90 CW), and 8 (rotated 90 CCW) — corresponding to how you hold the phone.
Why pHash?
imagehash supports several algorithms, each with different tradeoffs:
-
ahash(average hash) — Resizes the image to 8x8, converts to grayscale, and sets each bit based on whether the pixel is above or below the mean. Simple and fast, but sensitive to brightness changes and gamma corrections. A photo edited with slightly different exposure will produce a noticeably different hash. -
dhash(difference hash) — Similar to ahash, but instead of comparing pixels to the mean, it compares each pixel to its neighbor. This captures gradients rather than absolute brightness, making it more robust to exposure changes. But it struggles with crops, because shifting the image shifts all the gradient relationships. -
phash(perceptual hash) — Applies a Discrete Cosine Transform (DCT) to the image, then keeps only the low-frequency components. This is the one I chose. -
whash(wavelet hash) — Uses a Discrete Wavelet Transform (DWT) instead of DCT. Theoretically captures both frequency and spatial information, but in practice performs similarly to pHash for photographic content while being more sensitive to certain types of noise.
I went with pHash because it’s the most robust for photographic deduplication. Here’s why.
How the Discrete Cosine Transform Makes Hashing Work
The DCT is the same mathematical transform that JPEG uses internally. It converts a spatial-domain signal (pixel values) into a frequency-domain representation (how much of each “frequency” is present in the image). Low-frequency components correspond to smooth gradients and broad shapes — the overall structure of the image. High-frequency components correspond to fine detail, sharp edges, and noise.
The key insight behind pHash is that two versions of the same photo share the same low-frequency structure, even if the high-frequency details differ. A JPEG saved at quality 50 has lost fine detail compared to the same image at quality 95, but the broad shapes — the arrangement of sky, trees, and faces — are identical. Re-encoding, resizing, slight cropping, color space conversion — all of these primarily affect high frequencies while leaving the low-frequency structure intact.
Here’s what pHash does step by step:
- Resize the image to 32x32 pixels (this is done internally by
imagehash, separate from the 512x512 thumbnail I create for Pillow) - Convert to grayscale
- Apply a 2D DCT to the 32x32 pixel matrix
- Keep only the top-left 8x8 block of DCT coefficients — these are the lowest frequencies
- Compute the median of these 64 values
- Set each bit to 1 if the coefficient is above the median, 0 if below
The result is a 64-bit integer. Each bit encodes whether a particular low-frequency pattern is present in the image above or below average. Two images that look the same to a human will produce hashes that differ by only a few bits.
The Hamming distance between two hashes is simply the number of bits that differ — computed efficiently with an XOR followed by a popcount. A distance of 0 means the hashes are identical. A distance of 64 would mean every single bit is different (essentially, the inverse image).
Choosing the Right Threshold
I use a distance threshold of 10 (out of 64 bits), which I arrived at through trial and error on my own photo collection. The tradeoff is straightforward:
- Too low (e.g., 3-5): Misses duplicates that have been re-compressed or slightly cropped. You’ll still have near-dupes in the output.
- Too high (e.g., 15-20): Starts merging photos that are merely similar — two different sunset shots, or two group photos where one person moved. You lose distinct photos.
- The sweet spot (8-12): Catches re-saves, format conversions, burst-mode shots, and messaging-app copies, while keeping genuinely different photos separate.
At threshold 10, in practice this catches:
- The same photo saved as both
.jpgand.heic - Burst-mode shots of the same scene (as long as nothing moved much between frames)
- Re-saves at different JPEG quality levels
- Screenshots of the same screen at different times (if the content hasn’t changed much)
- Photos forwarded through messaging apps (which often re-encode and downscale)
When duplicates are found, the pipeline keeps the highest-resolution version. This is a simple max() over width * height — if you took the same photo and one copy is 4032x3024 and another is 1920x1080 from a messaging app, the original survives.
Grouping Strategy
The deduplication uses a straightforward first-match-wins grouping approach:
def deduplicate(entries: list[tuple[Path, imagehash.ImageHash, int]]) -> list[Path]:
groups: list[tuple[imagehash.ImageHash, list[tuple[Path, int]]]] = []
for path, h, pixels in entries:
placed = False
for ref_h, group in groups:
if h - ref_h <= HASH_DISTANCE_THRESHOLD:
group.append((path, pixels))
placed = True
break
if not placed:
groups.append((h, [(path, pixels)]))
# Keep the highest-resolution image from each group
return [max(group, key=lambda item: item[1])[0] for _, group in groups]
For each image, it iterates over existing groups and checks the Hamming distance against the group’s reference hash. The subtraction operator h - ref_h on ImageHash objects returns the Hamming distance — the number of bits that differ between two hashes. If it’s within the threshold of 10, the image joins that group. If no group matches, it becomes the seed of a new one.
This is O(n * g) where g is the number of groups, which is perfectly adequate for personal photo archives. For my 5,700 images, the grouping phase completed in under a minute. For collections in the hundreds of thousands, you’d want a VP-tree or BK-tree for nearest-neighbor lookup — but at this scale, the brute-force approach is simpler and fast enough.
Scene Classification: On-Device Neural Networks via Apple Vision
This is the part that made me unreasonably happy.
macOS ships with the Vision framework (import Vision in Swift), Apple’s high-level computer vision API. Introduced at WWDC 2017 alongside iOS 11, Vision was initially focused on face detection, barcode reading, and text recognition. Over subsequent releases, Apple steadily expanded it: image alignment in 2018, animal detection in 2019, hand and body pose estimation in 2020, and — crucially for this project — image classification via VNClassifyImageRequest.
VNClassifyImageRequest taps into a Core ML model that ships with macOS itself. You don’t download a model file, you don’t specify a model path, and you don’t need an internet connection. The model is embedded in the operating system and updated with macOS releases. It’s the same classifier that powers the “Categories” view in Apple Photos, the ability for Spotlight to search “photos of food” or “photos of mountains,” and the Memories feature that groups your beach vacation shots together.
The taxonomy is surprisingly granular. Rather than broad labels like “nature” or “people,” it returns identifiers like "landscape_mountain", "landscape_beach", "food_bread", "animal_cat", "animal_dog", "document", "screenshot", "selfie", and hundreds more. Each result comes with a confidence score between 0 and 1, and the framework returns all matching labels — not just the top one — so you could implement multi-label tagging if needed.
The Swift Bridge
Since Vision is a native Apple framework, there’s no direct Python API. The solution is a tiny compiled Swift binary that accepts image paths as command-line arguments and writes results to stdout:
import Vision
import Foundation
for path in CommandLine.arguments.dropFirst() {
let url = URL(fileURLWithPath: path)
let handler = VNImageRequestHandler(url: url, options: [:])
let request = VNClassifyImageRequest()
do {
try handler.perform([request])
let top = request.results?
.sorted { $0.confidence > $1.confidence }
.first
print("\(path)\t\(top?.identifier ?? "unknown")")
} catch {
print("\(path)\tunknown")
}
}
Twenty lines. The entire classifier. The Python side calls this binary with all image paths at once (batch mode), parses the tab-separated stdout, and maps each file to its top label:
def classify_scenes(paths: list[Path]) -> dict[str, str]:
result = {}
if not CLASSIFIER_PATH.exists():
for p in paths:
result[str(p)] = "unknown"
return result
proc = subprocess.run(
[str(CLASSIFIER_PATH)] + [str(p) for p in paths],
capture_output=True, text=True, timeout=120,
)
for line in proc.stdout.strip().splitlines():
path_str, label = line.split("\t", 1)
result[path_str] = label
return result
If the classify binary isn’t compiled or you’re running on Linux, it degrades gracefully — every image gets tagged "unknown" and the pipeline continues without scene labels.
Why Not Use a Python ML Model Instead?
You might wonder why I didn’t just use a Python-native classifier like a pre-trained ResNet or EfficientNet via PyTorch or TensorFlow. A few reasons:
-
Zero dependency weight. PyTorch alone is ~2 GB. TensorFlow is similar. Adding a deep learning framework to classify photos in a lightweight CLI tool felt absurd when macOS ships with a classifier built in.
-
No model management. Pre-trained models need to be downloaded, versioned, and loaded into memory. The Vision framework’s model is managed by the OS — it’s always there, always up to date.
-
Apple-optimized inference. The Vision framework runs on the Apple Neural Engine (ANE) on Apple Silicon Macs, which is purpose-built for ML inference. A Python-based model would run on CPU (or require additional Metal/MPS setup for GPU), and would likely be slower for single-image classification despite the overhead of spawning a subprocess.
-
The taxonomy matches the use case. Vision’s labels are designed for consumer photos — the exact domain I’m working in. A generic ImageNet classifier would give me “golden retriever” when I want “animal_dog”, or “volcano” when I want “landscape_mountain”. Apple’s labels are practical, not academic.
The tradeoff is that this feature is macOS-only. If you need cross-platform scene classification, you’d swap in a Python model. But on a Mac, this is free performance with zero setup.
Pass 2: Downscale, Convert, and Save
The second pass loads each surviving (non-duplicate) image one at a time, processes it, and writes it out. This is the most I/O-intensive phase — each image is fully decoded, transformed, re-encoded as HEIC, and written to disk.
Step 1: Full Load and EXIF Transpose
The image is loaded at full resolution, then ImageOps.exif_transpose() physically rotates the pixel data to match the EXIF orientation tag. This is applied again here (it was also applied during hashing) because the hash pass only worked with thumbnails — now we need the full-resolution pixels in the correct orientation before saving.
Step 2: Downscale with Lanczos Resampling
If either dimension exceeds 1920 pixels, the image is scaled down proportionally:
MAX_DIMENSION = 1920
def downscale_if_needed(img: Image.Image) -> Image.Image:
w, h = img.size
if w <= MAX_DIMENSION and h <= MAX_DIMENSION:
return img
scale = MAX_DIMENSION / max(w, h)
return img.resize((int(w * scale), int(h * scale)), Image.LANCZOS)
The choice of Image.LANCZOS is deliberate. Pillow offers several resampling filters, each implementing a different interpolation kernel — a mathematical function that determines how source pixels contribute to each output pixel when the image is resized:
-
NEAREST— Each output pixel takes the value of the single nearest source pixel. No blending, no smoothing. Fast but produces blocky, pixelated results. Only useful for pixel art or masks where you need hard edges. -
BILINEAR— Linearly interpolates between the 4 nearest source pixels (2x2 grid). Smooth but slightly blurry. Good for speed-critical applications where quality is secondary. -
BICUBIC— Uses a cubic polynomial to interpolate between the 16 nearest source pixels (4x4 grid). Sharper than bilinear, with a good balance of quality and speed. This is the default in many image tools. -
LANCZOS— Uses a windowed sinc function (specifically, a sinc function multiplied by a Lanczos window) to interpolate using a larger neighborhood of source pixels. The sinc function is the mathematically “ideal” interpolation kernel — it perfectly reconstructs a band-limited signal from its samples (this is the Nyquist-Shannon sampling theorem). The Lanczos window truncates the infinite sinc to a practical size (3 lobes for Lanczos3, which is what Pillow uses).
The practical difference: Lanczos produces the sharpest downscales with the least aliasing. It preserves fine text, thin lines, and texture detail better than bicubic. The tradeoff is speed — Lanczos evaluates more source pixels per output pixel — but for a batch job that runs once, the quality advantage is worth it.
Why 1920 pixels? It’s the horizontal resolution of 1080p — a reasonable maximum for photos that will be viewed on screens rather than printed. A 4032x3024 iPhone photo (12 megapixels) scaled to 1920x1440 goes from ~14 MB as JPEG to ~3 MB as HEIC, with negligible visible quality loss on any display smaller than a 27-inch 4K monitor.
Step 3: Convert to HEIC
The image is saved in HEIF format via pillow-heif. RGBA and palette-mode images are first converted to RGB, since HEIF’s alpha channel support isn’t universally handled by decoders:
def save_as_heic(img: Image.Image, output_path: Path) -> None:
if img.mode in ("RGBA", "P"):
img = img.convert("RGB")
img.save(output_path, format="HEIF")
This is where the bulk of the processing time goes. HEVC encoding is computationally expensive — the encoder needs to evaluate multiple block sizes, run intra-prediction in 35 modes, and perform CABAC entropy coding. For my 2,700 surviving images, this step alone accounted for the majority of the one-hour runtime.
Step 4: Filename Generation
Each output file is named with a structured, machine-sortable scheme:
{yyyymmdd}-{hhmmss}-{WxH}-{scene}.heic
The date and time come from the EXIF DateTime tag (tag 0x0132). If EXIF data is missing (screenshots, downloaded images), it falls back to the file’s mtime. Collision suffixes (-1 through -99) handle images taken in the same second.
This naming scheme means the output folder is automatically sorted chronologically with a plain ls. The embedded resolution and scene label make it trivially searchable — find all food photos, or all documents, or everything shot at full resolution.
Deep Dive: HEIC — The Format That Replaced JPEG (On Apple, At Least)
JPEG: Thirty Years of Dominance
To appreciate HEIC, you need to understand what it replaced and why it took so long.
JPEG was standardized in 1992 by the Joint Photographic Experts Group (which is what JPEG stands for — it’s a committee name, not a technical acronym). It was revolutionary: the first practical lossy image compression format. JPEG made digital photography possible on hardware with kilobytes of RAM by applying an 8x8 block DCT (Discrete Cosine Transform), quantizing the coefficients (this is the lossy step — fine detail is discarded), and encoding the result with Huffman coding.
For over three decades, JPEG was the photo format. Not because nothing better was invented — better algorithms existed by the early 2000s — but because ecosystem lock-in is extraordinarily powerful. Every camera wrote JPEG. Every browser rendered JPEG. Every image editor opened JPEG. Every social media platform accepted JPEG. Replacing it meant getting hardware manufacturers, OS vendors, browser developers, and app developers to all move simultaneously. Nobody wanted to go first.
The Long Search for a Successor
The industry tried, repeatedly, to replace JPEG:
-
JPEG 2000 (2000) — Used wavelet transforms instead of DCT, producing dramatically better quality at low bitrates. Adopted in professional workflows (digital cinema, medical imaging, satellite imagery) but never gained consumer traction because it was computationally expensive and the patent licensing was complex. Your digital cinema projector uses JPEG 2000; your phone never will.
-
WebP (2010) — Google’s answer, based on the VP8 video codec’s intra-frame coding. It offered ~30% smaller files than JPEG and added transparency and animation support (replacing both JPEG and GIF in one format). Chrome supported it immediately, Firefox and Safari dragged their feet for years. Safari didn’t add WebP support until 2020 — a full decade after the format launched. WebP achieved significant adoption on the web but never became a camera output format.
-
HEIC (2015) — The subject of this section. Based on HEVC (H.265), offering ~50% compression improvement over JPEG.
-
AVIF (2019) — Based on the AV1 video codec, developed by the Alliance for Open Media (Amazon, Google, Meta, Microsoft, Netflix, and others). AVIF matches or exceeds HEIC’s compression while being royalty-free — a direct response to HEVC’s patent licensing mess. Browser support arrived relatively quickly (Chrome 2020, Firefox 2021, Safari 2023). AVIF is arguably the strongest long-term contender to become the universal photo format, but adoption is still early.
-
JPEG XL (2021) — The “official” JPEG successor from the same standards body. Technically impressive: lossless recompression of existing JPEG files (shrink JPEGs by ~20% with zero quality loss), progressive decoding, HDR support, and competitive lossy compression. But it arrived late to a crowded field, and Google controversially removed JPEG XL support from Chrome in 2023, dealing a significant blow to web adoption. Its future remains uncertain.
Each of these formats is technically superior to JPEG. The reason JPEG persists isn’t technical — it’s that no single entity controls enough of the ecosystem to force a migration. Apple came closest.
Apple’s Gambit
The real inflection point came in 2017, when Apple adopted HEIC as the default photo format in iOS 11 and macOS High Sierra. Overnight, every new iPhone photo was being saved as HEIC instead of JPEG. This wasn’t a gentle opt-in — it was a system-wide default change affecting hundreds of millions of devices.
Apple’s motivation was straightforward: HEIC files are roughly half the size of equivalent JPEGs. With over a billion active Apple devices backing up photos to iCloud, halving the storage footprint translated directly into infrastructure savings — and into users hitting their “iCloud storage full” warnings less frequently. Apple also benefited from HEIC’s container features: Live Photos (a still image paired with a short video clip) fit naturally into HEIF’s multi-image container model, as did depth maps from dual-camera iPhones.
The strategy worked, partially. Within Apple’s ecosystem, HEIC became ubiquitous. But the rest of the industry adopted it grudgingly and slowly, in part because of the format it’s built on.
Why HEIC is Technically Superior
HEIC’s compression advantage comes from HEVC’s more modern toolbox compared to JPEG:
| Feature | JPEG | HEIC (HEVC) |
|---|---|---|
| Block size | Fixed 8x8 | Adaptive, up to 64x64 (CTU) |
| Transform | DCT only | DCT + DST, multiple sizes |
| Prediction | None (intra only, no prediction) | 35 intra-prediction modes |
| Entropy coding | Huffman | CABAC (context-adaptive binary arithmetic coding) |
| Color depth | 8-bit | 8-bit, 10-bit, 12-bit |
| Transparency | Not supported | Alpha channel support |
| Multiple images | Not supported | Image sequences, thumbnails, depth maps in one file |
The adaptive block sizes are particularly important for photos. JPEG chops every image into rigid 8x8 blocks, which is why you see those characteristic “blockiness” artifacts at low quality settings. HEVC can use blocks ranging from 8x8 up to 64x64, choosing the optimal size per region — large blocks for smooth skies, small blocks for fine texture. This alone accounts for a significant chunk of the compression improvement.
The Patent Problem
HEIC’s biggest obstacle has never been technical — it’s licensing. HEVC, the codec underneath HEIC, is covered by patents held by three separate patent pools: MPEG-LA, HEVC Advance, and Velos Media. Each pool demands its own royalty payments, and the terms are different for each. For a company shipping a browser or an OS, this means negotiating with three different entities just to decode a photo format. The total cost per device is small, but the legal complexity is enormous — and the uncertainty about future licensing terms made many companies hesitant to commit.
This licensing tangle is the primary reason HEIC hasn’t achieved universal adoption, and it’s a direct cautionary tale about how patent encumbrance can stifle a technically superior standard. It’s also the main reason the Alliance for Open Media created AV1 (and by extension, AVIF) as a royalty-free alternative — the major tech companies collectively decided that the future of media codecs shouldn’t be gated by patent pools.
The Compatibility Landscape
The practical result of the patent situation is a fragmented adoption timeline:
-
macOS / iOS: Full native support since 2017. Every Apple app reads and writes HEIC. AirDrop, iMessage, and iCloud all preserve the format. When you share a HEIC photo to a non-Apple platform, iOS silently converts to JPEG — most users never realize their photos aren’t JPEG.
-
Windows: Read support arrived in Windows 10 version 1803 (2018), but required a free “HEIF Image Extensions” download from the Microsoft Store. The encoding extension costs $0.99 — presumably to cover HEVC patent royalties. This one-dollar paywall is a minor annoyance but a major signal about how licensing friction trickles down to users.
-
Android: Read support since Android 9 (2018). Some Samsung and Google Pixel phones offer HEIC as a camera option, but JPEG remains the default on the vast majority of Android devices. The Android ecosystem’s fragmentation (thousands of device models, multiple camera apps) makes a format transition harder than Apple’s top-down approach.
-
Web browsers: This is where it hurts most. Chrome didn’t add HEIC support until 2024 — seven years after Apple adopted the format. Firefox still relies on OS-level codec support rather than bundling its own decoder, meaning HEIC works on macOS Firefox but not Linux Firefox. Safari, naturally, has supported it since day one. The web’s slow adoption means you can’t reliably use HEIC for web images — if you’re building a website, JPEG, WebP, or AVIF remain the safe choices.
-
Social media / messaging: Instagram, WhatsApp, and Telegram all accept HEIC uploads but transcode to JPEG internally. Twitter/X does the same. The format never reaches the viewer.
For a personal archive that stays on a Mac — which is exactly my use case — none of these compatibility issues matter. Every tool in the chain understands HEIC natively. But it’s worth understanding why “just use HEIC everywhere” isn’t viable advice for the broader world.
HEIC in Python
The Python ecosystem’s HEIC support is handled by pillow-heif, which wraps the libheif C library. It integrates with Pillow as a plugin — one call to register_heif_opener() and Pillow transparently handles .heic files in both directions:
from pillow_heif import register_heif_opener
register_heif_opener()
# Reading: Image.open() now handles .heic transparently
img = Image.open("photo.heic")
# Writing: save as HEIF format
# RGBA/palette images must be converted to RGB first — HEIF's alpha
# support exists but isn't universally handled by all decoders
if img.mode in ("RGBA", "P"):
img = img.convert("RGB")
img.save("output.heic", format="HEIF")
Under the hood, pillow-heif ships pre-built wheels with libheif statically linked, so there’s no system dependency to install. This is a big deal — before pillow-heif matured around 2022, working with HEIC in Python meant wrestling with pyheif, which required manually compiling libheif and libde265 from source.
Deep Dive: uv — How a Rust Rewrite Changed Python Packaging
The Problem uv Solved
Python’s packaging story has been, to put it diplomatically, a journey. To understand why uv was so enthusiastically adopted, you need to understand the landscape it replaced.
The early days (2000s): Python packages were installed with distutils (part of the standard library) and later setuptools, using the setup.py file as both configuration and build script. There was no dependency resolver — easy_install would download packages from PyPI and hope for the best. If two packages needed conflicting versions of the same dependency, you found out at runtime.
The pip era (2008-2020s): pip replaced easy_install as the standard installer and brought real dependency resolution. Combined with virtualenv (later python -m venv), you could isolate projects from each other. The workflow became:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt # hope nothing conflicts
pip freeze > requirements.txt # hope nothing drifts
This worked, but had fundamental limitations. requirements.txt is a flat list of pinned versions with no distinction between direct and transitive dependencies. There’s no lock file. There’s no way to reproduce the exact dependency tree on a different platform. pip resolves dependencies in Python, downloads packages sequentially, and doesn’t cache aggressively. On a project with native dependencies like Pillow (which ships platform-specific binary wheels) and NumPy, a clean pip install can take 30+ seconds.
The “alternatives” era (2016-2023): The community built increasingly sophisticated tools on top of pip:
-
pipenv(2017) — Kenneth Reitz’s attempt at aPipfile/Pipfile.lockworkflow, inspired by Bundler (Ruby) and npm (JavaScript). It combined virtualenv management and dependency locking into one tool. It gained momentum quickly but became notorious for slow lock times and confusing error messages. Development stalled for extended periods. -
poetry(2018) — Sebastien Eustace’s take, usingpyproject.tomlfor configuration and a custom dependency resolver. Poetry was a significant step forward: proper lock files, clean CLI, and separation of direct vs. transitive dependencies. It became the de facto choice for serious Python projects. But it was still written in Python, still slow to resolve, and sometimes produced resolution conflicts that were hard to debug. -
pip-tools— A simpler approach:pip-compilereads arequirements.infile and produces a fully pinnedrequirements.txt. Lightweight and composable, but still required manual virtualenv management and sequential installs. -
pdm,hatch,flit— More alternatives, each with slightly different philosophies. The fragmentation itself became a running joke in the Python community. “How do I set up a Python project?” had a different answer depending on who you asked and what year it was.
All of these tools shared a fundamental performance ceiling: they were written in Python, running interpreted code to resolve dependency graphs and orchestrate installations. They were fast enough for small projects, but on anything with a deep dependency tree or native extensions, you’d sit and wait.
Enter Astral
The breakthrough came from outside the Python-in-Python tooling world entirely.
In 2022, Charlie Marsh released ruff, a Python linter written in Rust. Ruff wasn’t just incrementally faster than existing linters like flake8 and pylint — it was 10-100x faster, fast enough to lint entire codebases in milliseconds. It proved a thesis: rewriting Python developer tools in Rust could produce not just a marginal improvement but a categorically different experience.
Marsh founded Astral to pursue this thesis commercially, raised funding, and in February 2024 released uv — a Python package installer and resolver written in Rust, designed as a drop-in replacement for pip and pip-tools.
The initial benchmarks were staggering. uv resolved and installed dependencies 10-100x faster than pip. Not on contrived benchmarks — on real-world projects with real dependency trees. Projects that took pip 30+ seconds to install completed in under a second with uv. The speedup comes from several architectural decisions:
- Rust-native dependency resolution — the resolver runs in compiled, multithreaded Rust instead of interpreted Python. Dependency resolution is a constraint-satisfaction problem that involves repeated network requests and version comparisons — exactly the kind of workload where compiled code with proper concurrency thrashes an interpreted language.
- Parallel downloads — packages are fetched concurrently, saturating network bandwidth instead of downloading one wheel at a time.
- Global cache with hard links —
uvmaintains a global package cache (~/.cache/uv/) and uses hard links to “install” packages into project virtualenvs. This means the second project that uses Pillow doesn’t download or copy anything — it creates hard links to the already-cached wheel in microseconds. - Pre-built wheel preference —
uvstrongly prefers binary wheels over source distributions, avoiding compilation where possible. For packages like NumPy that ship platform-specific wheels, this eliminates the need for a C compiler entirely.
But speed alone didn’t make uv ubiquitous. What sealed it was scope creep — the good kind. Over the course of 2024, uv expanded from a pip replacement into a complete Python project manager:
uv init— scaffolds a new project with apyproject.tomluv add/uv remove— manages dependencies declaratively (likenpm install <package>)uv sync— installs all dependencies from the lock file, creating the venv if neededuv run— executes a command inside the project’s venv without manual activationuv python install— installs Python interpreters themselves, eliminating the need forpyenvorasdfuv lock— generates a cross-platform lock file (uv.lock) with deterministic resolutionuv tool— installs and runs CLI tools in isolated environments (likepipx)
By late 2024, uv had effectively replaced pip, pip-tools, virtualenv, pyenv, poetry, and pipx — six separate tools — with a single, fast, static binary. The Python community, exhausted by years of tooling fragmentation, adopted it at remarkable speed. The GitHub repo crossed 40,000 stars within its first year. Conference talks stopped debating “pip vs. poetry vs. pipenv” and started asking “have you switched to uv yet?”
The broader impact is worth noting: uv and ruff together demonstrated that Python’s tooling didn’t have to be slow and fragmented. The problem was never Python-the-language — it was that the tools were written in Python-the-implementation. Rust gave the ecosystem a way to keep the Python developer experience while getting compiled-language performance where it mattered most.
uv in This Project
My pyproject.toml declares the project metadata and dependencies in standard PEP 621 format:
[project]
name = "image-processor"
version = "0.1.0"
requires-python = ">=3.14"
dependencies = [
"imagehash>=4.3.2",
"numpy>=2.0",
"pillow>=12.2.0",
"pillow-heif>=1.3.0",
]
From here, the entire workflow is two commands:
uv sync # resolve, lock, create venv, install — all in ~2 seconds (warm cache)
uv run python src/batch-image-processor.py input/ output/
No source .venv/bin/activate. No pip install -r requirements.txt. No wondering which Python version is on PATH. uv run finds the project’s venv (creating it if necessary), ensures dependencies are in sync, and executes the command — all transparently. It’s the kind of developer experience that, once you’ve used it, makes going back to the old workflow feel like dial-up.
More Code: The Pieces That Tie It Together
A few more snippets that show how the pipeline handles the tricky details.
EXIF Date Extraction with Fallback
Every output file is named with a timestamp, but not every input image has EXIF data. Screenshots, downloaded memes, images stripped by messaging apps — these have no embedded DateTime. The fallback is the file’s modification time:
def get_exif_datetime(path: Path) -> tuple[str, str]:
"""Return (yyyymmdd, hhmmss) from EXIF DateTime, or fallback to file mtime."""
try:
with Image.open(path) as img:
exif = img.getexif()
dt_str = exif.get(0x0132) # EXIF DateTime tag
if dt_str:
date_part, time_part = dt_str.split(" ")
return date_part.replace(":", ""), time_part.replace(":", "")
except Exception:
pass
mtime = datetime.fromtimestamp(path.stat().st_mtime)
return mtime.strftime("%Y%m%d"), mtime.strftime("%H%M%S")
Tag 0x0132 is the standard EXIF DateTime field, stored as a string in the format "2019:07:14 18:23:05". The colons in the date portion get replaced to produce 20190714, and the time becomes 182305. This gives filenames that sort chronologically with a plain ls.
Collision-Safe Output Naming
When two photos share the same timestamp (burst mode, or photos taken within the same second on different cameras), the output filename would collide. The pipeline handles this by appending suffixes:
def unique_output_path(output_dir: Path, filename: str) -> Path:
stem = Path(filename).stem
dest = output_dir / filename
if not dest.exists():
return dest
for i in range(1, 100):
dest = output_dir / f"{stem}-{i}.heic"
if not dest.exists():
return dest
return output_dir / f"{stem}-99.heic"
So a burst of sunset photos might produce 20190714-182305-1920x1080-landscape.heic, followed by 20190714-182305-1920x1080-landscape-1.heic, -2, and so on. Simple, predictable, no UUID noise.
Results
I pointed the tool at my full archive of 5,700 images and let it run on a MacBook laptop. Here’s what came out the other side:
| Metric | Value |
|---|---|
| Input images | 5,700 |
| Output images (after dedup) | ~2,700 |
| Duplicates removed | ~3,000 (burst shots, messaging app copies, iCloud sync artifacts, re-saves) |
| Processing time | ~1 hour on a MacBook |
| Scene labels applied | every surviving image tagged automatically |
More than half of my archive was duplicates. That number surprised me at first, but it makes sense once you think about how photos accumulate: every burst-mode sequence contributes 5-10 near-identical frames. Every photo shared over WhatsApp or Telegram gets re-saved as a lower-resolution copy. Every iCloud sync hiccup creates a IMG_1234 (1).jpg. Over fifteen years, these copies compound silently.
The one-hour runtime on a laptop is dominated by two things: the HEIC encoding in Pass 2 (HEVC compression is CPU-intensive) and the perceptual hashing in Pass 1 (each image must be decoded and DCT-transformed). On a desktop Mac with more thermal headroom, this would be significantly faster — but for a one-time archive job, an hour is perfectly fine. I started it, made coffee, and came back to a clean folder.
The output is a single flat directory of consistently named, deduplicated, reasonably sized HEIC files:
20090315-143022-1920x1440-landscape_mountain.heic
20090315-143024-1920x1440-landscape_mountain-1.heic
20110801-092117-1280x960-food.heic
20130412-181503-1920x1080-document.heic
20170625-200841-1920x1440-animal_cat.heic
...
Fifteen years of photo chaos, reduced to a browsable, searchable, chronologically sorted archive. No manual sorting. No cloud upload. Just a Python script, a Swift one-liner, and a Saturday morning.