The Imbalance Spectrum
- Mild (1:10): Class weights usually sufficient
- Moderate (1:100): SMOTE + class weights
- Severe (1:1000+): Anomaly detection framing, cost-sensitive learning
Technique Ranking (from my experience)
- Threshold tuning — Always do this. Default 0.5 is almost never optimal.
- Class weights — Easy, no data distortion, works 90% of the time.
- SMOTE — Helps on small datasets. Can hurt on large ones.
- Oversampling minority — Simple, often underrated.
- Undersampling majority — Loses information. Use carefully.
The Right Metric
Never use accuracy on imbalanced data. Use:
- F1 / F2 score
- PR-AUC (better than ROC-AUC for severe imbalance)
- Business-relevant cost matrix