Time Series

Weather Pattern Detection

9-method pipeline on 96,453 hourly records. K-Means (sil=0.45, K=3), DBSCAN, Isolation Forest (1,930 anomalies), LightGBM macro F1=0.74, 1D-CNN 94.85%, LSTM Autoencoder, Prophet (16 anomaly days).

View on Kaggle

0.45 (K=3)

K-Means Silhouette

1,930 anomalies (2%)

Isolation Forest

0.74

LightGBM macro F1

94.85%

1D-CNN accuracy

Dataset

96,453 hourly weather records — 7 meteorological features

Approach

Clustering → anomaly detection → supervised classification → 1D-CNN → Prophet forecasting

Tech Stack

PythonScikit-learnTensorFlow/LSTMXGBoostLightGBMProphet

Keywords

K-MeansDBSCANIsolation ForestLightGBM1D-CNNLSTM AutoencoderProphet

Visualizations6 Charts

Deep Dive

Multi-paradigm analysis of 96,453 hourly weather records.

Dataset

▸96,453 hourly records: temperature, apparent temperature, humidity, wind speed, wind bearing, visibility, pressure
▸Precipitation type: 88.4% rain, 11.1% snow, 0.5% none (severe imbalance)

Clustering (Phase 1)

Method	Result
K-Means	K=3, Silhouette=0.45, Davies-Bouldin=1.10
DBSCAN	2 clusters + 1,107 noise (10K sample)
GMM	Model selection via BIC/AIC

3 weather regimes: warm/clear, cold/overcast, transition.

Anomaly Detection (Phase 2)

Method	Anomalies
Isolation Forest	1,930 (2.0%)
LSTM Autoencoder	90 sequences (threshold: reconstruction > 0.0061)
STL + residuals	Anomaly flags on residual component

All 3 methods agree on the ~2% anomaly rate.

Classification (Phase 3 — Precipitation Type)

Model	Macro F1
Random Forest	0.51
XGBoost	0.70
LightGBM	0.74
1D-CNN (season)	94.85% accuracy

Forecasting (Phase 4) Prophet on temperature: 16 anomaly days detected via CI exceedance in 180-day test.

Key Insight The 0.5% "none" precipitation class makes macro F1 misleading. High accuracy (98%+) masks poor recall on the rare dry class. Real-world weather AI requires uncertainty quantification beyond accuracy metrics.

Back to Projects Hire Me