All Projects
Computer Vision
Facial Emotion Recognition
7-class emotion recognition on RAF-DB (12,271 images). Ensemble of ResNet50+ViT-Small+EfficientNetB3 achieves 86.57%. 2-phase transfer learning. GradCAM confirms focus on mouth/brow/eye regions per emotion.
86.57%
Ensemble Accuracy
84.84%
ResNet50 Accuracy
84.03%
ViT-Small Accuracy
70.66%
HOG+SVM Baseline
Dataset
RAF-DB: 15,339 images, 7 emotions, 17× class imbalance
Approach
HOG+SVM → custom CNNs → 2-phase transfer learning → soft ensemble + GradCAM
Tech Stack
PythonPyTorchResNet50ViT-SmallEfficientNetB3GradCAM
Keywords
ResNet50ViT-SmallEfficientNetB3GradCAMRAF-DBEmotionEnsemble
Visualizations6 Charts
Deep Dive
Multi-model facial emotion pipeline on RAF-DB — a challenging real-world dataset with 17× class imbalance.
Dataset (RAF-DB)
- ▸12,271 train + 3,068 test, 7 emotion classes
- ▸Pre-aligned 100×100 RGB, ImageNet normalization
- ▸Class imbalance 17×: Happiness (4,772) vs Fear (281)
- ▸Imbalance handling: WeightedRandomSampler + label smoothing 0.1
| Class | Train | Test |
|---|---|---|
| Happiness | 4,772 | 1,185 |
| Neutral | 2,524 | 680 |
| Sadness | 1,982 | 478 |
| Anger | 705 | 162 |
| Disgust | 717 | 160 |
| Surprise | 1,290 | 329 |
| Fear | 281 | 74 |
All Models Compared
| Model | Val Accuracy |
|---|---|
| HOG + SVM | 70.66% |
| SimpleCNN (2.78M params) | 71.64% |
| DeepCNN + ResBlocks (0.78M) | 75.10% |
| EfficientNetB3 | 73.21% |
| ViT-Small | 84.03% |
| ResNet50 | 84.84% |
| Ensemble (top 3 TL) | 86.57% |
2-Phase Transfer Learning
- ▸Warmup (5–8 epochs): frozen backbone, train head only
- ▸Fine-tune (20 epochs): full network, cosine annealing LR
GradCAM Findings
- ▸Happiness: mouth corners and cheek areas
- ▸Anger: inner brow and lip region
- ▸Fear: wide eye opening + raised brows
- ▸Disgust: nose wrinkle + upper lip
- ▸Misclassifications: Fear↔Sad (similar brow drop), Disgust↔Angry (similar lip tension)