How NomLens Works
A full walkthrough of every stage — from raw photograph to structured decode with Quốc ngữ transliteration and English meaning.
Claude Vision API fallback
On-device model handles the majority of characters. Low-confidence crops escalate to Claude Vision API for per-glyph expert reasoning.
On-device Core ML model
EfficientNet-B0, 972 classes, 97.6% accuracy, <10ms on Neural Engine. Works offline. OTA model updates — no App Store update required.
The five-step pipeline
Photograph
Point your camera at any Han Nôm source — a stone stele, temple inscription, manuscript page, or printed text. NomLens accepts photos from your library too.
Preprocess
Core Image filters run on-device in milliseconds: adaptive thresholding corrects uneven lighting on weathered stone, noise reduction cleans aged manuscript ink, and perspective correction fixes keystoning.
- Perspective correction — fix keystoning from camera angle
- Deskew — straighten rotated page
- Noise reduction — clean aged manuscript ink
- Contrast + brightness + desaturate (grayscale)
- Adaptive threshold — local contrast normalization for uneven stele lighting
- Unsharp mask — sharpen character edges
All filters run via Core Image (GPU-accelerated). Adaptive thresholding is custom-built using Gaussian blur + local mean subtraction — Core Image has no native implementation.
Segment & Sort
Apple's Vision framework locates individual characters. NomLens clusters them into columns and sorts right-to-left, top-to-bottom — the correct Han Nôm reading order.
Characters cluster into columns using a dynamic threshold (1.2× median bounding box width — adapts to character density). Columns sort right-to-left by midX; within each column, top-to-bottom by minY. This is the correct classical Han Nôm reading direction.
Vision returns normalized coordinates with origin at bottom-left. NomLens converts to pixel coordinates with origin top-left using VNImageRectForNormalizedRect, then adds 8px padding around each crop.
Classify
Each character crop is passed to an on-device Core ML model (EfficientNet-B0, 10.6 MB). High-confidence results are instant. Low-confidence characters escalate to Claude Vision API for expert fallback.
Results
A structured grid returns each character with its Unicode form, Quốc ngữ transliteration, English meaning, and a confidence badge. Tap any cell for full decode details. Everything persists in local history.
Confidence routing
The on-device model returns a calibrated confidence score (after temperature scaling with T=0.6908) for every character. NomLens routes based on that score:
Accepted on-device. Result shown with green badge.
Accepted on-device. Yellow badge flags for user review.
Escalated to Claude Vision API for expert classification.
Result structure
Each decoded character returns a structured record. The full decode for a page assembles these records in reading order.
{
"character": "南",
"type": "han",
"quoc_ngu": "nam",
"meaning": "south",
"confidence": "high",
"alternate_readings": [],
"damage_noted": false,
"notes": ""
}Low-confidence characters escalated to Claude Vision API use an identical output schema, returned by Claude's structured JSON response.
OTA model delivery
The Core ML model is not bundled in the app binary. It downloads on demand and hot-swaps without requiring an App Store update — critical for pushing accuracy improvements as the training data grows.
Version check on launch
ModelManager fetches a manifest.json from the hosted endpoint. If the version matches what's on disk, no download occurs.
SHA-256 verification
Every downloaded .mlpackage is verified against the manifest's SHA-256 hash before it replaces the active model. Corrupt downloads are discarded.
Compiled model cache
After first load, the compiled .mlmodelc is cached to disk. Subsequent launches load the compiled binary directly — no recompilation overhead.
Hot-swap without restart
ClassifierProxy holds the active model reference. ModelManager calls proxy.update() after verification — new model is live instantly.