Back to Blog
Machine Learning April 15, 2025 8 min read

Achieving AUC 0.9648 on IEEE-CIS Fraud Detection with LightGBM Stacking

A complete walkthrough of building a stacking ensemble that achieved AUC 0.9648 on the IEEE-CIS fraud dataset — feature engineering, model selection, and meta-learner design.

The Problem

The IEEE-CIS Fraud Detection challenge presents 590,540 training transactions with 433 features and only 3.5% fraud rate.

Key Feature Engineering

  • Time-based features: Hour of day, day of week, temporal drift
  • Card aggregations: Mean/std/count of TransactionAmt per card1/card2
  • Email domain features: same_email_domain flag, domain-level fraud rates
  • M-column boolean counts: T/F/missing across M1-M9

Model Pipeline

ModelOOF AUC
LightGBM0.9648
XGBoost0.9631
CatBoost0.9529

Key Insights

  1. Don't drop V-columns — they carry Vesta's proprietary fraud signals
  2. Time-based CV is more realistic than StratifiedKFold
  3. Card-level aggregations are the highest-impact feature group
  4. LightGBM's native missing-value handling gives it the edge
LightGBMFraud DetectionFeature EngineeringKaggleStacking
O

Ossama Elhakki

AI Engineer & ML Systems Builder — Morocco