All Projects
Fraud Detection
Telco Customer Churn Prediction
3-phase churn pipeline on 7,043 customers. Optuna-tuned XGBoost: AUC 0.8484, F1 0.5947. Phase 1: 5 baselines → Phase 2: boosting ensembles → Phase 3: 100-trial Optuna + SHAP. Tenure & contract type dominate.
0.8484
XGBoost Optuna AUC
0.5947
XGBoost Optuna F1
0.844 (SVM)
Best baseline AUC
100
Optuna trials
Dataset
7,043 telecom customers, 21 features, 26.5% churn rate
Approach
Baseline → ensemble → 100-trial Optuna HPO → SHAP interpretability
Tech Stack
PythonXGBoostLightGBMCatBoostOptunaSHAPScikit-learn
Keywords
XGBoostLightGBMCatBoostOptunaSHAPChurnCustomer Analytics
Visualizations5 Charts
Deep Dive
3-phase ML pipeline for customer churn on Telco Customer dataset.
Dataset
- ▸7,043 customers, 21 features (demographics, services, account, charges)
- ▸Churn rate: 26.54% — moderate imbalance, stratified 80/20 split
- ▸Preprocessing: label encoding (binary), one-hot encoding (multi-class), StandardScaler
Phase 1 — Baseline Classifiers
| Model | Accuracy | AUC |
|---|---|---|
| Logistic Regression | 81.1% | 0.842 |
| Decision Tree | 78.8% | 0.735 |
| KNN | 78.3% | 0.818 |
| SVM | 79.2% | 0.844 |
| Random Forest | 80.4% | 0.843 |
Phase 2 — Boosting Ensembles
| Model | Accuracy | AUC |
|---|---|---|
| XGBoost | 80.0% | 0.840 |
| LightGBM | 80.0% | 0.813 |
| CatBoost | 79.3% | 0.817 |
| Soft Voting | 80.3% | 0.819 |
| Stacking | 79.8% | 0.834 |
Phase 3 — Optuna HPO (XGBoost, 100 trials)
- ▸AUC: 0.8484, Accuracy: 80.55%, F1 (churn): 0.5947
- ▸Best: n_estimators=500, max_depth=5, lr=0.05, subsample=0.8
SHAP Top Churn Drivers
- ▸Tenure (↓ churn with longer tenure)
- ▸Contract type (month-to-month = highest risk)
- ▸Monthly charges (higher charges → more churn)
- ▸Tech support (absence = higher churn)
- ▸Internet service (Fiber optic users churn more)