Fake News Detection
13-model NLP pipeline on 44,898 news articles. Soft Voting Ensemble & Stacking both achieve 99.86% accuracy, AUC=1.0. Only 2 errors on the full test set. DistilBERT matches at 99.87% on 6K subset.
44,898 articles (21K real + 23K fake), 70/15/15 split
Combined TF-IDF (word + char n-grams) → 13-model benchmark → transformer fine-tuning
Comprehensive fake news detection benchmarking classical ML and transformers on a balanced 44,898-article dataset.
Dataset
- ▸21,417 real + 23,481 fake news articles
- ▸70/15/15 stratified train/val/test split
- ▸Features: TF-IDF word n-grams (1–2, 50K features) + char n-grams (3–5, 30K features) combined
Full 13-Model Benchmark
| Model | Accuracy | AUC |
|---|---|---|
| Complement NB | 96.52% | 0.9936 |
| Logistic Regression | 99.65% | 0.9999 |
| Linear SVC | 99.81% | 1.0000 |
| SGD Classifier | 99.72% | 1.0000 |
| Decision Tree | 99.63% | 0.9950 |
| Random Forest | 99.70% | 0.9998 |
| Extra Trees | 99.37% | 0.9997 |
| XGBoost | 99.83% | 0.9997 |
| LightGBM | 99.81% | 0.9996 |
| Soft Voting | 99.86% | 1.0000 |
| Stacking | 99.86% | 1.0000 |
| BiLSTM | 98.5% | — |
| DistilBERT | 99.87% | 0.9999 |
Error Analysis Full test set: 1 false positive + 1 false negative. The dataset has strong source signals — Reuters/AP wire service language vs conspiracy-style language — that combined TF-IDF captures almost perfectly.
Why Combined TF-IDF Beats Standalone Word n-grams capture semantic content; character n-grams capture writing style artifacts (punctuation abuse, ALL-CAPS, unusual spacin g). Combining both gives >99.8% across all reasonable models.
DistilBERT Finding Fine-tuned on only 6K articles (subset) → 99.87% accuracy. Demonstrates transformers generalize better under limited labeled data than classical models trained on full dataset.