Time Series

Historical Product Demand Forecasting

19-model benchmark: classical TS → ML → DL → ensemble. CatBoost R²=0.7125 (best). ML crushes classical TS (SMAPE 115–130% vs 35–40% for TS, but R² negative for TS). Walk-forward CV with Optuna.

View on Kaggle

0.7125

CatBoost R²

8,511

Quantile Reg MAE

0.11 (Prophet)

Classical TS best R²

Models benchmarked

Dataset

DataCo SCMS: 215K+ rows, 36 features, walk-forward CV

Approach

Walk-forward CV → 19 models: TS baselines → ML → DL (LSTM/TFT/N-BEATS) → ensemble

Tech Stack

PythonCatBoostXGBoostLightGBMPyTorch LightningN-BEATSTFTOptuna

Keywords

CatBoostXGBoostLightGBMLSTMTFTN-BEATSWalk-ForwardDemand Forecasting

Visualizations8 Charts

Deep Dive

Comprehensive demand forecasting benchmark — 19 models on DataCo supply chain.

Dataset

▸215K+ rows, 36 features after engineering
▸Walk-forward cross-validation (expanding window)
▸ADF test: stationary (p<0.0001)

All 19 Models — Key Results

Model	MAE	SMAPE	R²
Naive (last value)	1,145K	99.4%	-3.09
Classical TS (best: Theta)	536K	35.0%	-0.07
Prophet	523K	34.8%	+0.11
CatBoost	9.5K	121.3%	+0.713
XGBoost (Optuna)	9.4K	115.2%	+0.707
Ridge Regression	9.4K	133.6%	+0.707
Quantile Reg (P50)	8.5K	75.3%	+0.700
LSTM	554K	36.2%	-0.08
TFT	602K	38.0%	-0.28
N-BEATS	652K	39.2%	-0.66

The ML vs Classical TS Paradox

▸Classical TS: correct scale (SMAPE 35%), wrong patterns (R²<0)
▸ML: correct patterns (R²=0.71), large absolute errors (SMAPE 115%)
▸Root cause: ML predicts per-product with lag features → tiny absolute error on most products, fails on aggregate scale
▸Quantile Regression P50 best balances: MAE=8.5K, SMAPE=75.3%, R²=0.70

Why Deep Learning Fails Here LSTM/TFT/N-BEATS all R²<0 — worse than Prophet. Demand data has discrete product-category structure that trees model perfectly; sequential dependencies that LSTM exploits are weak here.

Back to Projects Hire Me