Applied MLintermediate

Time Series Forecasting

“When the order of observations matters — learning from the past to predict the future”

Trend-seasonal-residual decomposition, lag features, rolling statistics, Fourier seasonality, TimeSeriesSplit cross-validation, ARIMA intuition, and gradient boosting for tabular forecasting — with animated decomposition and 3-step forecast.

50 min

11 diagrams

7 Concepts Covered

Prerequisites

→Probability & Statistics

→Gradient Boosting

Concepts Covered

DecompositionLag FeaturesRolling StatisticsTimeSeriesSplitACF/PACFARIMAFourier Features

Previous: Partial Dependence & ICE Plots Next: Neural Networks — Forward & Backpropagation

∑Key Formulas

Decomposition

Additive: trend + seasonal + residual. Multiplicative: T × S × R when amplitudes scale with trend.

AR(p) Model

Autoregression: current value is a linear combination of p past values

ACF

Autocorrelation Function — how correlated is the series with its k-step lag?

MAPE

Mean Absolute Percentage Error — scale-free forecasting metric

▶Interactive Simulation

Loading visualization…

🎯

Time Series Are Everywhere

motivation

Stock prices, electricity demand, server CPU load, website traffic, COVID cases, weather, sales — all are time series. The fundamental difference from standard ML: observations are ordered and correlated. Using tomorrow's data to predict yesterday violates causality. Using a standard train/test split (random shuffle) contaminates your evaluation because test data appears in the training period. Time series require temporal cross-validation and temporal feature engineering.

Prophet (Meta) and ARIMA are industry standards for forecasting. But gradient boosting with careful lag features and TimeSeriesSplit cross-validation often beats both on tabular time series.

💡

Decomposition: Separating Signal from Noise

intuition

Most real-world time series have three components: Trend (the long-run direction — sales increasing over years), Seasonality (repeating patterns — higher sales in December, lower in January), and Residuals (random noise after trend and seasonality are removed). Additive decomposition works when seasonal amplitude is constant; multiplicative when it grows with the trend. STL (Seasonal-Trend decomposition using LOESS) is the modern robust approach — handles multiple seasonality periods and outliers.

🔬

Creating Features from Time

deepdive

Time series can be treated as supervised ML by creating lag features and rolling statistics. Lag features: y_{t-1}, y_{t-2}, ..., y_{t-p} capture autocorrelation. Rolling statistics: rolling_mean(window=7), rolling_std, rolling_max capture recent trend and volatility. Calendar features: hour_of_day, day_of_week, month, is_holiday capture seasonality. Fourier features: sin(2πt/period), cos(2πt/period) encode smooth seasonal patterns. Once these features are created, any ML model (XGBoost, LightGBM) can be applied.

Create lag features: df['lag_1'] = df['y'].shift(1)

Rolling statistics: df['roll_mean_7'] = df['y'].rolling(7).mean()

Calendar features: df['dayofweek'] = df.index.dayofweek

Fourier seasonality: sin/cos pairs for each seasonal period

Always use TimeSeriesSplit — never shuffle time series for CV

Gap between train/validation: add gap= to avoid leakage from autocorrelation

⚙️

TimeSeriesSplit: Correct Cross-Validation

algorithm

Fold 1: Train=[t₁…t₃₀₀], Val=[t₃₀₁…t₄₀₀]

Fold 2: Train=[t₁…t₄₀₀], Val=[t₄₀₁…t₅₀₀]

Fold 3: Train=[t₁…t₅₀₀], Val=[t₅₀₁…t₆₀₀]

Training window always ends before validation — no future leakage

Option: gap=k between train end and val start (avoids autocorrelation leakage)

Option: max_train_size=N for rolling window (only last N points in train)

</>

Forecasting with sklearn + LightGBM

code

python52 lines

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_absolute_error

# ── Sample daily time-series DataFrame ────────────────────────────────
dates = pd.date_range('2022-01-01', periods=365, freq='D')
np.random.seed(42)
trend = np.linspace(100, 200, 365)
seasonal = 20 * np.sin(2 * np.pi * np.arange(365) / 7)  # weekly pattern
noise = np.random.randn(365) * 5
df = pd.DataFrame({'sales': trend + seasonal + noise}, index=dates)

def create_features(df, target_col, lags, rolling_windows):
    """Create lag and rolling features for supervised time series forecasting."""
    df = df.copy()
    for lag in lags:
        df[f'lag_{lag}'] = df[target_col].shift(lag)
    for w in rolling_windows:
        df[f'roll_mean_{w}'] = df[target_col].shift(1).rolling(w).mean()
        df[f'roll_std_{w}']  = df[target_col].shift(1).rolling(w).std()
    # Calendar features
    df['dayofweek']  = df.index.dayofweek
    df['month']      = df.index.month
    df['is_weekend'] = df['dayofweek'] >= 5
    # Fourier seasonality (weekly=7, yearly=365)
    for k in range(1, 3):
        df[f'sin_week_{k}'] = np.sin(2*np.pi*k * df.index.dayofyear / 7)
        df[f'cos_week_{k}'] = np.cos(2*np.pi*k * df.index.dayofyear / 7)
    return df.dropna()

df_feat = create_features(df, 'sales', lags=[1,2,3,7,14,28], rolling_windows=[7,14,28])
X = df_feat.drop('sales', axis=1)
y = df_feat['sales']

# ── TimeSeriesSplit cross-validation ──────────────────────────────
tscv = TimeSeriesSplit(n_splits=5, gap=7)  # 7-day gap prevents autocorrelation leakage
maes = []

for train_idx, val_idx in tscv.split(X):
    X_tr, X_val = X.iloc[train_idx], X.iloc[val_idx]
    y_tr, y_val = y.iloc[train_idx], y.iloc[val_idx]

    model = lgb.LGBMRegressor(n_estimators=500, learning_rate=0.05,
                               num_leaves=31, min_child_samples=20)
    model.fit(X_tr, y_tr,
              eval_set=[(X_val, y_val)],
              callbacks=[lgb.early_stopping(50, verbose=False)])
    maes.append(mean_absolute_error(y_val, model.predict(X_val)))

print(f"CV MAE: {np.mean(maes):.2f} ± {np.std(maes):.2f}")

⚠️

Time Series Pitfalls

pitfall

Using random train/test split on time series data is the #1 mistake — your model trains on future data, resulting in wildly optimistic evaluation. Always use TimeSeriesSplit or a single temporal split where train comes before test. Second: not adding a gap between train and validation windows — autocorrelation means the last training point and first validation point are highly correlated, making validation look easy. Third: feature leakage — using a rolling mean of y itself without proper shifting means future values contaminate current features. Always shift(1) before rolling.

For production forecasting, retrain your model as new data arrives (online learning or periodic retraining). Models that were accurate 6 months ago may have drifted as the distribution of the time series changes.

?Knowledge Check

Progress is saved in your browser — no account needed.

Partial Dependence & ICE Plots

Neural Networks — Forward & Backpropagation

Need an AI engineer or data scientist?

I build custom ML models, AI agents, computer vision, and automation — from idea to production.

Get in touch View services