TL;DR
- Use LightGBM when speed matters most and your dataset is large (>100K rows)
- Use XGBoost when you need reproducibility and battle-tested stability
- Use CatBoost when you have many high-cardinality categoricals
Training Speed Benchmark
On 500K rows, 200 features, 1000 trees:
| Model | Time | RAM |
|---|---|---|
| LightGBM | 45s | 2.1GB |
| XGBoost | 210s | 4.8GB |
| CatBoost | 130s | 3.2GB |
When XGBoost Wins
- Exact split finding on small datasets
- Better with sparse data (text features as TF-IDF)
- More stable across random seeds
When LightGBM Wins
- Large datasets (leaf-wise growth is faster)
- Native categorical handling
- DART for better regularization