All Projects
Computer VisionNLP

Handwritten Name Recognition

OCR benchmark on 66K images. TrOCR (ViT+GPT-2, 334M) best: CER=0.0481, 80% exact-match. CRNN-ResNet34: CER=0.0502. AMP + torch.compile + gradient accumulation optimizations.

0.0481
TrOCR CER
80%
Exact-Match Accuracy
0.0502
CRNN-ResNet34 CER
334M
TrOCR params
Dataset

66K handwriting images, 64×256px, 44-char vocabulary

Approach

CRNN variants (CNN+BiLSTM+CTC) vs TrOCR (ViT+GPT2) with AMP + torch.compile

Tech Stack
PythonPyTorchTrOCR (HuggingFace)CRNNResNet34BiLSTMCTC
Keywords
TrOCRCRNNCTC LossBiLSTMResNet34OCRHuggingFace
Visualizations4 Charts
Deep Dive

Multi-model OCR comparison for unconstrained handwritten name recognition.

Dataset

  • 66K train / 8.2K val / 41K test images (64×256 grayscale)
  • 44-character vocabulary (alphanumeric + special chars)
  • 20% subset used for experiments

Architecture Comparison

ModelCER ↓Exact MatchSpeed
CRNN-EfficientNet0.1235Fastest
CRNN-ResNet18~105s/ep
CRNN-ResNet340.0502~251s/ep
TrOCR0.048180%34.9 min total

CRNN (CNN + BiLSTM + CTC)

ResNet34 backbone → BiLSTM(2×256) → CTC Loss
→ Greedy / Beam search (4 beams) decoding

TrOCR (ViT + GPT-2, 334M params) Fine-tuned via HuggingFace Trainer, 10 epochs, 20% data subset.

Training Optimizations

TechniqueImpact
Mixed precision (AMP)2× speedup, -50% memory
torch.compile+30% speedup
Gradient accumulation (4 steps)-75% effective batch memory
Image caching (1GB memmap)3× data loading speed

CER vs Exact Match CER=0.0481 translates to 80% exact-match accuracy — the full name must match character-for-character. Even a single wrong character counts as failure, which is why exact match is much lower than CER suggests.