Handwritten Name Recognition
OCR benchmark on 66K images. TrOCR (ViT+GPT-2, 334M) best: CER=0.0481, 80% exact-match. CRNN-ResNet34: CER=0.0502. AMP + torch.compile + gradient accumulation optimizations.
66K handwriting images, 64×256px, 44-char vocabulary
CRNN variants (CNN+BiLSTM+CTC) vs TrOCR (ViT+GPT2) with AMP + torch.compile
Multi-model OCR comparison for unconstrained handwritten name recognition.
Dataset
- ▸66K train / 8.2K val / 41K test images (64×256 grayscale)
- ▸44-character vocabulary (alphanumeric + special chars)
- ▸20% subset used for experiments
Architecture Comparison
| Model | CER ↓ | Exact Match | Speed |
|---|---|---|---|
| CRNN-EfficientNet | 0.1235 | — | Fastest |
| CRNN-ResNet18 | — | — | ~105s/ep |
| CRNN-ResNet34 | 0.0502 | — | ~251s/ep |
| TrOCR | 0.0481 | 80% | 34.9 min total |
CRNN (CNN + BiLSTM + CTC)
ResNet34 backbone → BiLSTM(2×256) → CTC Loss
→ Greedy / Beam search (4 beams) decoding
TrOCR (ViT + GPT-2, 334M params) Fine-tuned via HuggingFace Trainer, 10 epochs, 20% data subset.
Training Optimizations
| Technique | Impact |
|---|---|
| Mixed precision (AMP) | 2× speedup, -50% memory |
| torch.compile | +30% speedup |
| Gradient accumulation (4 steps) | -75% effective batch memory |
| Image caching (1GB memmap) | 3× data loading speed |
CER vs Exact Match CER=0.0481 translates to 80% exact-match accuracy — the full name must match character-for-character. Even a single wrong character counts as failure, which is why exact match is much lower than CER suggests.