All Projects
Time Series

DataCo Smart Supply Chain ML

Leakage-free ML on 180,519 orders. LightGBM AUC 0.8563 (late delivery). Gradient Boosting R²=0.9996 (profit regression). Removed post-fulfillment columns that inflate to AUC=1.0 in most published solutions.

0.8563
LightGBM AUC (class)
0.9996
GB R² (regression)
2.04
GB RMSE (profit $)
0.862 ± 0.001
5-fold CV RF AUC
Dataset

DataCo SCMS: 180,519 records (leakage-audited)

Approach

Leakage audit → dual-task (classification + regression) → 5-fold CV validation

Tech Stack
PythonXGBoost 3.2.0LightGBM 4.6.0Gradient BoostingPandas
Keywords
XGBoostLightGBMSupply ChainLeakage-FreeClassificationRegression
Visualizations6 Charts
Deep Dive

Advanced ML on DataCo supply chain with critical leakage fix.

Dataset

  • 180,519 orders, 53 raw features → 44 after leakage audit

Leakage Audit — Removed Columns

ColumnWhy Removed
Days for shipping (real)Actual outcome — not at prediction time
Delivery StatusDirect label encoding of target
Benefit per orderPost-fulfillment computation
Sales per customerAggregated post-delivery

Without leakage: AUC ~0.86. With leakage: AUC = 1.0 (most published results — invalid).

Task 1: Late Delivery Classification

ModelAccuracyAUC
Logistic Regression72.2%0.790
Random Forest73.8%0.871
XGBoost74.5%0.859
LightGBM74.5%0.856

5-fold CV: RF 0.862±0.001 | XGB 0.853±0.002 | LGB 0.851±0.002

Task 2: Order Profit Regression

ModelRMSE
Ridge/Linear56.50.706
XGBoost3.600.9988
LightGBM4.520.9981
Gradient Boosting2.040.9996

Engineered Features (order-placement time only) order_month, day_of_week, hour, quarter, discount_rate, price_band