Weather Pattern Detection
9-method pipeline on 96,453 hourly records. K-Means (sil=0.45, K=3), DBSCAN, Isolation Forest (1,930 anomalies), LightGBM macro F1=0.74, 1D-CNN 94.85%, LSTM Autoencoder, Prophet (16 anomaly days).
96,453 hourly weather records — 7 meteorological features
Clustering → anomaly detection → supervised classification → 1D-CNN → Prophet forecasting
Multi-paradigm analysis of 96,453 hourly weather records.
Dataset
- ▸96,453 hourly records: temperature, apparent temperature, humidity, wind speed, wind bearing, visibility, pressure
- ▸Precipitation type: 88.4% rain, 11.1% snow, 0.5% none (severe imbalance)
Clustering (Phase 1)
| Method | Result |
|---|---|
| K-Means | K=3, Silhouette=0.45, Davies-Bouldin=1.10 |
| DBSCAN | 2 clusters + 1,107 noise (10K sample) |
| GMM | Model selection via BIC/AIC |
3 weather regimes: warm/clear, cold/overcast, transition.
Anomaly Detection (Phase 2)
| Method | Anomalies |
|---|---|
| Isolation Forest | 1,930 (2.0%) |
| LSTM Autoencoder | 90 sequences (threshold: reconstruction > 0.0061) |
| STL + residuals | Anomaly flags on residual component |
All 3 methods agree on the ~2% anomaly rate.
Classification (Phase 3 — Precipitation Type)
| Model | Macro F1 |
|---|---|
| Random Forest | 0.51 |
| XGBoost | 0.70 |
| LightGBM | 0.74 |
| 1D-CNN (season) | 94.85% accuracy |
Forecasting (Phase 4) Prophet on temperature: 16 anomaly days detected via CI exceedance in 180-day test.
Key Insight The 0.5% "none" precipitation class makes macro F1 misleading. High accuracy (98%+) masks poor recall on the rare dry class. Real-world weather AI requires uncertainty quantification beyond accuracy metrics.