All Projects
Fraud Detection

Telco Customer Churn Prediction

3-phase churn pipeline on 7,043 customers. Optuna-tuned XGBoost: AUC 0.8484, F1 0.5947. Phase 1: 5 baselines → Phase 2: boosting ensembles → Phase 3: 100-trial Optuna + SHAP. Tenure & contract type dominate.

0.8484
XGBoost Optuna AUC
0.5947
XGBoost Optuna F1
0.844 (SVM)
Best baseline AUC
100
Optuna trials
Dataset

7,043 telecom customers, 21 features, 26.5% churn rate

Approach

Baseline → ensemble → 100-trial Optuna HPO → SHAP interpretability

Tech Stack
PythonXGBoostLightGBMCatBoostOptunaSHAPScikit-learn
Keywords
XGBoostLightGBMCatBoostOptunaSHAPChurnCustomer Analytics
Visualizations5 Charts
Deep Dive

3-phase ML pipeline for customer churn on Telco Customer dataset.

Dataset

  • 7,043 customers, 21 features (demographics, services, account, charges)
  • Churn rate: 26.54% — moderate imbalance, stratified 80/20 split
  • Preprocessing: label encoding (binary), one-hot encoding (multi-class), StandardScaler

Phase 1 — Baseline Classifiers

ModelAccuracyAUC
Logistic Regression81.1%0.842
Decision Tree78.8%0.735
KNN78.3%0.818
SVM79.2%0.844
Random Forest80.4%0.843

Phase 2 — Boosting Ensembles

ModelAccuracyAUC
XGBoost80.0%0.840
LightGBM80.0%0.813
CatBoost79.3%0.817
Soft Voting80.3%0.819
Stacking79.8%0.834

Phase 3 — Optuna HPO (XGBoost, 100 trials)

  • AUC: 0.8484, Accuracy: 80.55%, F1 (churn): 0.5947
  • Best: n_estimators=500, max_depth=5, lr=0.05, subsample=0.8

SHAP Top Churn Drivers

  1. Tenure (↓ churn with longer tenure)
  2. Contract type (month-to-month = highest risk)
  3. Monthly charges (higher charges → more churn)
  4. Tech support (absence = higher churn)
  5. Internet service (Fiber optic users churn more)