Computer VisionNLP

Handwritten Name Recognition

OCR benchmark on 66K images. TrOCR (ViT+GPT-2, 334M) best: CER=0.0481, 80% exact-match. CRNN-ResNet34: CER=0.0502. AMP + torch.compile + gradient accumulation optimizations.

View on Kaggle

0.0481

TrOCR CER

80%

Exact-Match Accuracy

0.0502

CRNN-ResNet34 CER

334M

TrOCR params

Dataset

66K handwriting images, 64×256px, 44-char vocabulary

Approach

CRNN variants (CNN+BiLSTM+CTC) vs TrOCR (ViT+GPT2) with AMP + torch.compile

Tech Stack

PythonPyTorchTrOCR (HuggingFace)CRNNResNet34BiLSTMCTC

Keywords

TrOCRCRNNCTC LossBiLSTMResNet34OCRHuggingFace

Visualizations4 Charts

Deep Dive

Multi-model OCR comparison for unconstrained handwritten name recognition.

Dataset

▸66K train / 8.2K val / 41K test images (64×256 grayscale)
▸44-character vocabulary (alphanumeric + special chars)
▸20% subset used for experiments

Architecture Comparison

Model	CER ↓	Exact Match	Speed
CRNN-EfficientNet	0.1235	—	Fastest
CRNN-ResNet18	—	—	~105s/ep
CRNN-ResNet34	0.0502	—	~251s/ep
TrOCR	0.0481	80%	34.9 min total

CRNN (CNN + BiLSTM + CTC)

ResNet34 backbone → BiLSTM(2×256) → CTC Loss
→ Greedy / Beam search (4 beams) decoding

TrOCR (ViT + GPT-2, 334M params) Fine-tuned via HuggingFace Trainer, 10 epochs, 20% data subset.

Training Optimizations

Technique	Impact
Mixed precision (AMP)	2× speedup, -50% memory
torch.compile	+30% speedup
Gradient accumulation (4 steps)	-75% effective batch memory
Image caching (1GB memmap)	3× data loading speed

CER vs Exact Match CER=0.0481 translates to 80% exact-match accuracy — the full name must match character-for-character. Even a single wrong character counts as failure, which is why exact match is much lower than CER suggests.

Back to Projects Hire Me