All Projects
Time Series

Historical Product Demand Forecasting

19-model benchmark: classical TS → ML → DL → ensemble. CatBoost R²=0.7125 (best). ML crushes classical TS (SMAPE 115–130% vs 35–40% for TS, but R² negative for TS). Walk-forward CV with Optuna.

0.7125
CatBoost R²
8,511
Quantile Reg MAE
0.11 (Prophet)
Classical TS best R²
19
Models benchmarked
Dataset

DataCo SCMS: 215K+ rows, 36 features, walk-forward CV

Approach

Walk-forward CV → 19 models: TS baselines → ML → DL (LSTM/TFT/N-BEATS) → ensemble

Tech Stack
PythonCatBoostXGBoostLightGBMPyTorch LightningN-BEATSTFTOptuna
Keywords
CatBoostXGBoostLightGBMLSTMTFTN-BEATSWalk-ForwardDemand Forecasting
Visualizations8 Charts
Deep Dive

Comprehensive demand forecasting benchmark — 19 models on DataCo supply chain.

Dataset

  • 215K+ rows, 36 features after engineering
  • Walk-forward cross-validation (expanding window)
  • ADF test: stationary (p<0.0001)

All 19 Models — Key Results

ModelMAESMAPE
Naive (last value)1,145K99.4%-3.09
Classical TS (best: Theta)536K35.0%-0.07
Prophet523K34.8%+0.11
CatBoost9.5K121.3%+0.713
XGBoost (Optuna)9.4K115.2%+0.707
Ridge Regression9.4K133.6%+0.707
Quantile Reg (P50)8.5K75.3%+0.700
LSTM554K36.2%-0.08
TFT602K38.0%-0.28
N-BEATS652K39.2%-0.66

The ML vs Classical TS Paradox

  • Classical TS: correct scale (SMAPE 35%), wrong patterns (R²<0)
  • ML: correct patterns (R²=0.71), large absolute errors (SMAPE 115%)
  • Root cause: ML predicts per-product with lag features → tiny absolute error on most products, fails on aggregate scale
  • Quantile Regression P50 best balances: MAE=8.5K, SMAPE=75.3%, R²=0.70

Why Deep Learning Fails Here LSTM/TFT/N-BEATS all R²<0 — worse than Prophet. Demand data has discrete product-category structure that trees model perfectly; sequential dependencies that LSTM exploits are weak here.