Machine Learning April 15, 2025 8 min read

Achieving AUC 0.9648 on IEEE-CIS Fraud Detection with LightGBM Stacking

A complete walkthrough of building a stacking ensemble that achieved AUC 0.9648 on the IEEE-CIS fraud dataset — feature engineering, model selection, and meta-learner design.

The Problem

The IEEE-CIS Fraud Detection challenge presents 590,540 training transactions with 433 features and only 3.5% fraud rate.

Key Feature Engineering

Time-based features: Hour of day, day of week, temporal drift
Card aggregations: Mean/std/count of TransactionAmt per card1/card2
Email domain features: same_email_domain flag, domain-level fraud rates
M-column boolean counts: T/F/missing across M1-M9

Model Pipeline

Model	OOF AUC
LightGBM	0.9648
XGBoost	0.9631
CatBoost	0.9529

Key Insights

Don't drop V-columns — they carry Vesta's proprietary fraud signals
Time-based CV is more realistic than StratifiedKFold
Card-level aggregations are the highest-impact feature group
LightGBM's native missing-value handling gives it the edge

LightGBMFraud DetectionFeature EngineeringKaggleStacking

O

Ossama Elhakki

AI Engineer & ML Systems Builder — Morocco

About me →Contact →