All Projects
Computer Vision

Facial Emotion Recognition

7-class emotion recognition on RAF-DB (12,271 images). Ensemble of ResNet50+ViT-Small+EfficientNetB3 achieves 86.57%. 2-phase transfer learning. GradCAM confirms focus on mouth/brow/eye regions per emotion.

86.57%
Ensemble Accuracy
84.84%
ResNet50 Accuracy
84.03%
ViT-Small Accuracy
70.66%
HOG+SVM Baseline
Dataset

RAF-DB: 15,339 images, 7 emotions, 17× class imbalance

Approach

HOG+SVM → custom CNNs → 2-phase transfer learning → soft ensemble + GradCAM

Tech Stack
PythonPyTorchResNet50ViT-SmallEfficientNetB3GradCAM
Keywords
ResNet50ViT-SmallEfficientNetB3GradCAMRAF-DBEmotionEnsemble
Visualizations6 Charts
Deep Dive

Multi-model facial emotion pipeline on RAF-DB — a challenging real-world dataset with 17× class imbalance.

Dataset (RAF-DB)

  • 12,271 train + 3,068 test, 7 emotion classes
  • Pre-aligned 100×100 RGB, ImageNet normalization
  • Class imbalance 17×: Happiness (4,772) vs Fear (281)
  • Imbalance handling: WeightedRandomSampler + label smoothing 0.1
ClassTrainTest
Happiness4,7721,185
Neutral2,524680
Sadness1,982478
Anger705162
Disgust717160
Surprise1,290329
Fear28174

All Models Compared

ModelVal Accuracy
HOG + SVM70.66%
SimpleCNN (2.78M params)71.64%
DeepCNN + ResBlocks (0.78M)75.10%
EfficientNetB373.21%
ViT-Small84.03%
ResNet5084.84%
Ensemble (top 3 TL)86.57%

2-Phase Transfer Learning

  1. Warmup (5–8 epochs): frozen backbone, train head only
  2. Fine-tune (20 epochs): full network, cosine annealing LR

GradCAM Findings

  • Happiness: mouth corners and cheek areas
  • Anger: inner brow and lip region
  • Fear: wide eye opening + raised brows
  • Disgust: nose wrinkle + upper lip
  • Misclassifications: Fear↔Sad (similar brow drop), Disgust↔Angry (similar lip tension)