DataCo Smart Supply Chain ML
Leakage-free ML on 180,519 orders. LightGBM AUC 0.8563 (late delivery). Gradient Boosting R²=0.9996 (profit regression). Removed post-fulfillment columns that inflate to AUC=1.0 in most published solutions.
DataCo SCMS: 180,519 records (leakage-audited)
Leakage audit → dual-task (classification + regression) → 5-fold CV validation
Advanced ML on DataCo supply chain with critical leakage fix.
Dataset
- ▸180,519 orders, 53 raw features → 44 after leakage audit
Leakage Audit — Removed Columns
| Column | Why Removed |
|---|---|
| Days for shipping (real) | Actual outcome — not at prediction time |
| Delivery Status | Direct label encoding of target |
| Benefit per order | Post-fulfillment computation |
| Sales per customer | Aggregated post-delivery |
Without leakage: AUC ~0.86. With leakage: AUC = 1.0 (most published results — invalid).
Task 1: Late Delivery Classification
| Model | Accuracy | AUC |
|---|---|---|
| Logistic Regression | 72.2% | 0.790 |
| Random Forest | 73.8% | 0.871 |
| XGBoost | 74.5% | 0.859 |
| LightGBM | 74.5% | 0.856 |
5-fold CV: RF 0.862±0.001 | XGB 0.853±0.002 | LGB 0.851±0.002
Task 2: Order Profit Regression
| Model | RMSE | R² |
|---|---|---|
| Ridge/Linear | 56.5 | 0.706 |
| XGBoost | 3.60 | 0.9988 |
| LightGBM | 4.52 | 0.9981 |
| Gradient Boosting | 2.04 | 0.9996 |
Engineered Features (order-placement time only) order_month, day_of_week, hour, quarter, discount_rate, price_band