All Projects
Time Series

Weather Pattern Detection

9-method pipeline on 96,453 hourly records. K-Means (sil=0.45, K=3), DBSCAN, Isolation Forest (1,930 anomalies), LightGBM macro F1=0.74, 1D-CNN 94.85%, LSTM Autoencoder, Prophet (16 anomaly days).

0.45 (K=3)
K-Means Silhouette
1,930 anomalies (2%)
Isolation Forest
0.74
LightGBM macro F1
94.85%
1D-CNN accuracy
Dataset

96,453 hourly weather records — 7 meteorological features

Approach

Clustering → anomaly detection → supervised classification → 1D-CNN → Prophet forecasting

Tech Stack
PythonScikit-learnTensorFlow/LSTMXGBoostLightGBMProphet
Keywords
K-MeansDBSCANIsolation ForestLightGBM1D-CNNLSTM AutoencoderProphet
Visualizations6 Charts
Deep Dive

Multi-paradigm analysis of 96,453 hourly weather records.

Dataset

  • 96,453 hourly records: temperature, apparent temperature, humidity, wind speed, wind bearing, visibility, pressure
  • Precipitation type: 88.4% rain, 11.1% snow, 0.5% none (severe imbalance)

Clustering (Phase 1)

MethodResult
K-MeansK=3, Silhouette=0.45, Davies-Bouldin=1.10
DBSCAN2 clusters + 1,107 noise (10K sample)
GMMModel selection via BIC/AIC

3 weather regimes: warm/clear, cold/overcast, transition.

Anomaly Detection (Phase 2)

MethodAnomalies
Isolation Forest1,930 (2.0%)
LSTM Autoencoder90 sequences (threshold: reconstruction > 0.0061)
STL + residualsAnomaly flags on residual component

All 3 methods agree on the ~2% anomaly rate.

Classification (Phase 3 — Precipitation Type)

ModelMacro F1
Random Forest0.51
XGBoost0.70
LightGBM0.74
1D-CNN (season)94.85% accuracy

Forecasting (Phase 4) Prophet on temperature: 16 anomaly days detected via CI exceedance in 180-day test.

Key Insight The 0.5% "none" precipitation class makes macro F1 misleading. High accuracy (98%+) masks poor recall on the rare dry class. Real-world weather AI requires uncertainty quantification beyond accuracy metrics.