Fraud DetectionFeatured

Ethereum Blockchain Fraud Detection

Blockchain fraud detection on 9,841 Ethereum addresses. XGBoost+LightGBM+CatBoost+Stacking with Optuna HPO (40 trials) and SHAP. AUC 0.9973, F1 0.9658 at optimal threshold 0.85.

View on Kaggle

0.9973

Stacking AUC

0.9658

F1 Score (Fraud)

0.85

Optimal threshold

0.9992

Optuna best AUC

Dataset

9,841 Ethereum addresses, 51 on-chain behavioral features

Approach

Baseline → SMOTE + Optuna HPO + stacking ensemble + threshold tuning

Tech Stack

PythonXGBoostLightGBMCatBoostOptunaSHAPSMOTE (imbalanced-learn)

Keywords

XGBoostLightGBMCatBoostSMOTEOptunaSHAPBlockchain

Visualizations6 Charts

Deep Dive

Two-stage pipeline detecting fraudulent Ethereum addresses from on-chain behavioral features.

Dataset

▸9,841 addresses: 7,662 legitimate (77.9%) + 2,179 fraud (22.1%)
▸51 features: ERC20 transaction patterns, sent/received amounts, unique addresses, timing
▸829 missing values in ERC20 features → median imputation

Feature Engineering

▸Sent/received ratio, transaction frequency, average value per transaction
▸ERC20 activity aggregation (unique tokens, transaction velocity)
▸Log transforms on skewed amount distributions (56 features after engineering)

Stage 1 — Baseline

Model	AUC	Notes
Logistic Regression	0.8419	Weak on behavioral patterns
Random Forest	0.9973	Already excellent

Stage 2 — Advanced Pipeline

▸SMOTE oversampling → 50/50 balance (11,070 training samples)
▸
Optuna HPO — XGBoost, 40 trials → Best CV AUC: 0.9992
- ▸Best: n_estimators=395, max_depth=3, lr=0.14, subsample=0.85
▸Train XGBoost + LightGBM + CatBoost
▸Stacking meta-learner (Logistic Regression)
▸Threshold tuning → maximize F1

Final Results

Model	AUC	F1 (Fraud)
XGBoost	0.9971	0.9659
LightGBM	0.9972	0.9569
CatBoost	0.9969	0.9584
Stacking	0.9973	—

Optimal threshold: 0.85 → F1: 0.9658

SHAP Top Fraud Indicators ERC20 sent count, unique address diversity, total ether received, timing irregularity, ERC20 token diversity

Back to Projects Hire Me