All Projects
Time Series
Historical Product Demand Forecasting
19-model benchmark: classical TS → ML → DL → ensemble. CatBoost R²=0.7125 (best). ML crushes classical TS (SMAPE 115–130% vs 35–40% for TS, but R² negative for TS). Walk-forward CV with Optuna.
0.7125
CatBoost R²
8,511
Quantile Reg MAE
0.11 (Prophet)
Classical TS best R²
19
Models benchmarked
Dataset
DataCo SCMS: 215K+ rows, 36 features, walk-forward CV
Approach
Walk-forward CV → 19 models: TS baselines → ML → DL (LSTM/TFT/N-BEATS) → ensemble
Tech Stack
PythonCatBoostXGBoostLightGBMPyTorch LightningN-BEATSTFTOptuna
Keywords
CatBoostXGBoostLightGBMLSTMTFTN-BEATSWalk-ForwardDemand Forecasting
Visualizations8 Charts
Deep Dive
Comprehensive demand forecasting benchmark — 19 models on DataCo supply chain.
Dataset
- ▸215K+ rows, 36 features after engineering
- ▸Walk-forward cross-validation (expanding window)
- ▸ADF test: stationary (p<0.0001)
All 19 Models — Key Results
| Model | MAE | SMAPE | R² |
|---|---|---|---|
| Naive (last value) | 1,145K | 99.4% | -3.09 |
| Classical TS (best: Theta) | 536K | 35.0% | -0.07 |
| Prophet | 523K | 34.8% | +0.11 |
| CatBoost | 9.5K | 121.3% | +0.713 |
| XGBoost (Optuna) | 9.4K | 115.2% | +0.707 |
| Ridge Regression | 9.4K | 133.6% | +0.707 |
| Quantile Reg (P50) | 8.5K | 75.3% | +0.700 |
| LSTM | 554K | 36.2% | -0.08 |
| TFT | 602K | 38.0% | -0.28 |
| N-BEATS | 652K | 39.2% | -0.66 |
The ML vs Classical TS Paradox
- ▸Classical TS: correct scale (SMAPE 35%), wrong patterns (R²<0)
- ▸ML: correct patterns (R²=0.71), large absolute errors (SMAPE 115%)
- ▸Root cause: ML predicts per-product with lag features → tiny absolute error on most products, fails on aggregate scale
- ▸Quantile Regression P50 best balances: MAE=8.5K, SMAPE=75.3%, R²=0.70
Why Deep Learning Fails Here LSTM/TFT/N-BEATS all R²<0 — worse than Prophet. Demand data has discrete product-category structure that trees model perfectly; sequential dependencies that LSTM exploits are weak here.