NomLens

The NomLens Classifier

EfficientNet-B0 trained on Han Nôm character data, exported to Core ML, and delivered OTA to iOS devices. Architecture, training strategy, calibration, and performance breakdown.

97.6%
Validation Accuracy
99.3%
Precision @ ≥90% Confidence
1.4%
Routes to Claude
10.6 MB
Model Size

Architecture

NomLens uses EfficientNet-B0 — a CNN designed specifically for mobile/edge deployment via compound scaling of depth, width, and input resolution. It achieves better accuracy per parameter than ResNet or MobileNet at this size class.

This is a single-character image classifier, not an OCR sequence model. Segmentation is handled upstream by Apple's Vision framework, so the model only needs to solve: "given a 96×96 crop of one character, what is it?" — a simpler and more accurate task than sequence decoding.

After the convolutional layers, AdaptiveAvgPool collapses spatial dimensions to a 1280-dimensional feature vector. The classifier head (Linear 1280→N) maps this to probabilities over N character classes.

EfficientNet-B0 shape trace
Input[1, 3, 96, 96]
Stem Conv 3×3 s2[1, 32, 48, 48]
MBConv1 ×1[1, 16, 48, 48]
MBConv6 ×2 (1)[1, 24, 24, 24]
MBConv6 ×2 (2)[1, 40, 12, 12]
MBConv6 ×3 (1)[1, 80, 6, 6]
MBConv6 ×3 (2)[1, 112, 6, 6]
MBConv6 ×4[1, 192, 3, 3]
MBConv6 ×1[1, 320, 3, 3]
Head Conv 1×1[1, 1280, 3, 3]
AdaptiveAvgPool[1, 1280] ← feature vector
Dropout(0.3)[1, 1280]
Linear(1280→N)[1, N] ← logits
PropertyValue
ArchitectureEfficientNet-B0
Parameters5.3M
Input96×96 px RGB, single character crop
OutputclassLabel (String) + classLabelProbs (Dict)
Format.mlpackage (mlprogram, iOS 16+)
Size10.6 MB (v1)
Inference<10ms on iPhone Neural Engine
Temperature (T)0.6908
ECE after scaling0.0034
Training frameworkPyTorch ≥2.2.0 + coremltools ≥7.0

Training strategy

Transfer learning from ImageNet pretrained weights. The backbone already knows edges, curves, textures, and geometric patterns — the same low-level features needed for stroke and radical recognition. Two-phase fine-tuning prevents the random classification head from destroying those pretrained features.

Head-only (epochs 1–5)

  • ·Backbone weights frozen (requires_grad=False)
  • ·Only Linear(1280→N) classification head trains
  • ·Adam optimizer, lr=1e-3
  • ·Rationale: prevents garbage gradients from random head from destroying pretrained backbone features

Full fine-tune (epochs 6–50)

  • ·All 5.3M parameters train
  • ·AdamW optimizer, lr=1e-4 (10× lower than phase 1)
  • ·CosineAnnealingLR — smooth decay toward ~0
  • ·Weight decay=1e-4 (L2 regularization against overfitting)
  • ·Stopped at epoch 24 — converged at 97.6% val accuracy

Class imbalance handling

511 Han classes have ~576 HWDB samples each (294K total). 461 Nôm-only classes have 2–4 font renders each (1,900 total). Without intervention, the model would learn to predict Han characters almost exclusively. WeightedRandomSampler gives each sample a weight of 1/(number of samples in its class), ensuring every class appears at roughly equal frequency in every training batch regardless of raw sample count.

Training data

CASIA-HWDB

Primary

294,280 images across 511 CJK character classes. Real handwriting from Chinese writers on pen tablets. Institute of Automation, Chinese Academy of Sciences.

Covers the Hán layer of Han Nôm. Dense training data for the most common characters.

Han Nôm Font Renders

Primary (Nôm)

1,944 PNG images rendered from HanNomA.ttf and HanNomB.ttf — 4 font variants, 2 images per class.

461 Nôm-specific characters don't exist in any handwriting database. Font renders are the only available training data for these classes.

NomNaOCR Dataset (Kaggle)

Future (v2)

38,318 labeled character patches from real woodblock-print manuscripts. Covers Truyện Kiều × 3 versions, Lục Vân Tiên, and Đại Việt Sử Ký Toàn Thư.

Would meaningfully improve model performance on printed manuscript sources. Not yet incorporated.

User Corrections (Phase 3)

Future flywheel

Verified in-app corrections feed a retraining pipeline. Every low-confidence prediction that a user corrects becomes labeled training data.

The moat: real-world field data from actual users and scholars cannot be replicated from scratch.

Temperature scaling

Neural networks trained with cross-entropy loss are systematically overconfident — a raw softmax output of "97%" might only correspond to 80% real accuracy. This makes confidence-based routing meaningless without calibration.

The fix

calibrated = softmax(logits / T)

T > 1 → spreads probability → less confident
T < 1 → concentrates probability → more confident
T = 1 → no change

v1 results

Temperature (T)
0.6908
ECE
0.10380.0034
T is baked into the exported Core ML model via forward_calibrated() — the iOS app never sees raw logits. All outputs are calibrated probabilities.

Character coverage

Coverage is measured against a 22-work Chữ Nôm corpus from chunom.org. Rare characters outside the class set fall through to Claude Vision API fallback.

v1 (current)
83.5%(972)
v2 (planned)
95.2%(2000)
v3 (future)
99.5%(3000)

v2 (2,000 classes) and v3 (3,000 classes) are blocked on additional training data. The app's correction flywheel — every user-verified label — is the primary mechanism for unlocking them.