The NomLens Classifier
EfficientNet-B0 trained on Han Nôm character data, exported to Core ML, and delivered OTA to iOS devices. Architecture, training strategy, calibration, and performance breakdown.
Architecture
NomLens uses EfficientNet-B0 — a CNN designed specifically for mobile/edge deployment via compound scaling of depth, width, and input resolution. It achieves better accuracy per parameter than ResNet or MobileNet at this size class.
This is a single-character image classifier, not an OCR sequence model. Segmentation is handled upstream by Apple's Vision framework, so the model only needs to solve: "given a 96×96 crop of one character, what is it?" — a simpler and more accurate task than sequence decoding.
After the convolutional layers, AdaptiveAvgPool collapses spatial dimensions to a 1280-dimensional feature vector. The classifier head (Linear 1280→N) maps this to probabilities over N character classes.
| Property | Value |
|---|---|
| Architecture | EfficientNet-B0 |
| Parameters | 5.3M |
| Input | 96×96 px RGB, single character crop |
| Output | classLabel (String) + classLabelProbs (Dict) |
| Format | .mlpackage (mlprogram, iOS 16+) |
| Size | 10.6 MB (v1) |
| Inference | <10ms on iPhone Neural Engine |
| Temperature (T) | 0.6908 |
| ECE after scaling | 0.0034 |
| Training framework | PyTorch ≥2.2.0 + coremltools ≥7.0 |
Training strategy
Transfer learning from ImageNet pretrained weights. The backbone already knows edges, curves, textures, and geometric patterns — the same low-level features needed for stroke and radical recognition. Two-phase fine-tuning prevents the random classification head from destroying those pretrained features.
Head-only (epochs 1–5)
- ·Backbone weights frozen (requires_grad=False)
- ·Only Linear(1280→N) classification head trains
- ·Adam optimizer, lr=1e-3
- ·Rationale: prevents garbage gradients from random head from destroying pretrained backbone features
Full fine-tune (epochs 6–50)
- ·All 5.3M parameters train
- ·AdamW optimizer, lr=1e-4 (10× lower than phase 1)
- ·CosineAnnealingLR — smooth decay toward ~0
- ·Weight decay=1e-4 (L2 regularization against overfitting)
- ·Stopped at epoch 24 — converged at 97.6% val accuracy
Class imbalance handling
511 Han classes have ~576 HWDB samples each (294K total). 461 Nôm-only classes have 2–4 font renders each (1,900 total). Without intervention, the model would learn to predict Han characters almost exclusively. WeightedRandomSampler gives each sample a weight of 1/(number of samples in its class), ensuring every class appears at roughly equal frequency in every training batch regardless of raw sample count.
Training data
CASIA-HWDB
Primary294,280 images across 511 CJK character classes. Real handwriting from Chinese writers on pen tablets. Institute of Automation, Chinese Academy of Sciences.
Covers the Hán layer of Han Nôm. Dense training data for the most common characters.
Han Nôm Font Renders
Primary (Nôm)1,944 PNG images rendered from HanNomA.ttf and HanNomB.ttf — 4 font variants, 2 images per class.
461 Nôm-specific characters don't exist in any handwriting database. Font renders are the only available training data for these classes.
NomNaOCR Dataset (Kaggle)
Future (v2)38,318 labeled character patches from real woodblock-print manuscripts. Covers Truyện Kiều × 3 versions, Lục Vân Tiên, and Đại Việt Sử Ký Toàn Thư.
Would meaningfully improve model performance on printed manuscript sources. Not yet incorporated.
User Corrections (Phase 3)
Future flywheelVerified in-app corrections feed a retraining pipeline. Every low-confidence prediction that a user corrects becomes labeled training data.
The moat: real-world field data from actual users and scholars cannot be replicated from scratch.
Temperature scaling
Neural networks trained with cross-entropy loss are systematically overconfident — a raw softmax output of "97%" might only correspond to 80% real accuracy. This makes confidence-based routing meaningless without calibration.
The fix
calibrated = softmax(logits / T) T > 1 → spreads probability → less confident T < 1 → concentrates probability → more confident T = 1 → no change
v1 results
Character coverage
Coverage is measured against a 22-work Chữ Nôm corpus from chunom.org. Rare characters outside the class set fall through to Claude Vision API fallback.
v2 (2,000 classes) and v3 (3,000 classes) are blocked on additional training data. The app's correction flywheel — every user-verified label — is the primary mechanism for unlocking them.