All Projects
Fraud DetectionFeatured

Ethereum Blockchain Fraud Detection

Blockchain fraud detection on 9,841 Ethereum addresses. XGBoost+LightGBM+CatBoost+Stacking with Optuna HPO (40 trials) and SHAP. AUC 0.9973, F1 0.9658 at optimal threshold 0.85.

0.9973
Stacking AUC
0.9658
F1 Score (Fraud)
0.85
Optimal threshold
0.9992
Optuna best AUC
Dataset

9,841 Ethereum addresses, 51 on-chain behavioral features

Approach

Baseline → SMOTE + Optuna HPO + stacking ensemble + threshold tuning

Tech Stack
PythonXGBoostLightGBMCatBoostOptunaSHAPSMOTE (imbalanced-learn)
Keywords
XGBoostLightGBMCatBoostSMOTEOptunaSHAPBlockchain
Visualizations6 Charts
Deep Dive

Two-stage pipeline detecting fraudulent Ethereum addresses from on-chain behavioral features.

Dataset

  • 9,841 addresses: 7,662 legitimate (77.9%) + 2,179 fraud (22.1%)
  • 51 features: ERC20 transaction patterns, sent/received amounts, unique addresses, timing
  • 829 missing values in ERC20 features → median imputation

Feature Engineering

  • Sent/received ratio, transaction frequency, average value per transaction
  • ERC20 activity aggregation (unique tokens, transaction velocity)
  • Log transforms on skewed amount distributions (56 features after engineering)

Stage 1 — Baseline

ModelAUCNotes
Logistic Regression0.8419Weak on behavioral patterns
Random Forest0.9973Already excellent

Stage 2 — Advanced Pipeline

  1. SMOTE oversampling → 50/50 balance (11,070 training samples)
  2. Optuna HPO — XGBoost, 40 trials → Best CV AUC: 0.9992
    • Best: n_estimators=395, max_depth=3, lr=0.14, subsample=0.85
  3. Train XGBoost + LightGBM + CatBoost
  4. Stacking meta-learner (Logistic Regression)
  5. Threshold tuning → maximize F1

Final Results

ModelAUCF1 (Fraud)
XGBoost0.99710.9659
LightGBM0.99720.9569
CatBoost0.99690.9584
Stacking0.9973

Optimal threshold: 0.85 → F1: 0.9658

SHAP Top Fraud Indicators ERC20 sent count, unique address diversity, total ether received, timing irregularity, ERC20 token diversity