Pipeline

How NomLens Works

A full walkthrough of every stage — from raw photograph to structured decode with Quốc ngữ transliteration and English meaning.

Phase 1 (current)

Claude Vision API fallback

On-device model handles the majority of characters. Low-confidence crops escalate to Claude Vision API for per-glyph expert reasoning.

Phase 2 (active)

On-device Core ML model

EfficientNet-B0, 972 classes, 97.6% accuracy, <10ms on Neural Engine. Works offline. OTA model updates — no App Store update required.

The five-step pipeline

📷

Photograph

Point your camera at any Han Nôm source — a stone stele, temple inscription, manuscript page, or printed text. NomLens accepts photos from your library too.

⚙️

Preprocess

Core Image filters run on-device in milliseconds: adaptive thresholding corrects uneven lighting on weathered stone, noise reduction cleans aged manuscript ink, and perspective correction fixes keystoning.

Filter chain (in order):

Perspective correction — fix keystoning from camera angle
Deskew — straighten rotated page
Noise reduction — clean aged manuscript ink
Contrast + brightness + desaturate (grayscale)
Adaptive threshold — local contrast normalization for uneven stele lighting
Unsharp mask — sharpen character edges

All filters run via Core Image (GPU-accelerated). Adaptive thresholding is custom-built using Gaussian blur + local mean subtraction — Core Image has no native implementation.

🔲

Segment & Sort

Apple's Vision framework locates individual characters. NomLens clusters them into columns and sorts right-to-left, top-to-bottom — the correct Han Nôm reading order.

Reading order:

Characters cluster into columns using a dynamic threshold (1.2× median bounding box width — adapts to character density). Columns sort right-to-left by midX; within each column, top-to-bottom by minY. This is the correct classical Han Nôm reading direction.

Vision returns normalized coordinates with origin at bottom-left. NomLens converts to pixel coordinates with origin top-left using VNImageRectForNormalizedRect, then adds 8px padding around each crop.

🧠

Classify

Each character crop is passed to an on-device Core ML model (EfficientNet-B0, 10.6 MB). High-confidence results are instant. Low-confidence characters escalate to Claude Vision API for expert fallback.

📋

Results

A structured grid returns each character with its Unicode form, Quốc ngữ transliteration, English meaning, and a confidence badge. Tap any cell for full decode details. Everything persists in local history.

Confidence routing

The on-device model returns a calibrated confidence score (after temperature scaling with T=0.6908) for every character. NomLens routes based on that score:

≥ 90%

High confidence

Accepted on-device. Result shown with green badge.

60 – 90%

Medium confidence

Accepted on-device. Yellow badge flags for user review.

< 60%

Low confidence

Escalated to Claude Vision API for expert classification.

Important: These thresholds apply to temperature-scaled scores, not raw softmax probabilities. Raw neural network outputs are systematically overconfident — without calibration, a "95%" score might only correspond to 80% real accuracy. NomLens uses temperature scaling (T=0.6908) to produce scores with an Expected Calibration Error of 0.0034 — meaning the score you see is accurate to within ~0.3%.

Result structure

Each decoded character returns a structured record. The full decode for a page assembles these records in reading order.

{
  "character":          "南",
  "type":               "han",
  "quoc_ngu":           "nam",
  "meaning":            "south",
  "confidence":         "high",
  "alternate_readings": [],
  "damage_noted":       false,
  "notes":              ""
}

Low-confidence characters escalated to Claude Vision API use an identical output schema, returned by Claude's structured JSON response.

OTA model delivery

The Core ML model is not bundled in the app binary. It downloads on demand and hot-swaps without requiring an App Store update — critical for pushing accuracy improvements as the training data grows.

Version check on launch

ModelManager fetches a manifest.json from the hosted endpoint. If the version matches what's on disk, no download occurs.

SHA-256 verification

Every downloaded .mlpackage is verified against the manifest's SHA-256 hash before it replaces the active model. Corrupt downloads are discarded.

Compiled model cache

After first load, the compiled .mlmodelc is cached to disk. Subsequent launches load the compiled binary directly — no recompilation overhead.

Hot-swap without restart

ClassifierProxy holds the active model reference. ModelManager calls proxy.update() after verification — new model is live instantly.

Full OTA documentation →

Model architecture →Developer docs