All Projects
Fraud DetectionFeatured
Ethereum Blockchain Fraud Detection
Blockchain fraud detection on 9,841 Ethereum addresses. XGBoost+LightGBM+CatBoost+Stacking with Optuna HPO (40 trials) and SHAP. AUC 0.9973, F1 0.9658 at optimal threshold 0.85.
0.9973
Stacking AUC
0.9658
F1 Score (Fraud)
0.85
Optimal threshold
0.9992
Optuna best AUC
Dataset
9,841 Ethereum addresses, 51 on-chain behavioral features
Approach
Baseline → SMOTE + Optuna HPO + stacking ensemble + threshold tuning
Tech Stack
PythonXGBoostLightGBMCatBoostOptunaSHAPSMOTE (imbalanced-learn)
Keywords
XGBoostLightGBMCatBoostSMOTEOptunaSHAPBlockchain
Visualizations6 Charts
Deep Dive
Two-stage pipeline detecting fraudulent Ethereum addresses from on-chain behavioral features.
Dataset
- ▸9,841 addresses: 7,662 legitimate (77.9%) + 2,179 fraud (22.1%)
- ▸51 features: ERC20 transaction patterns, sent/received amounts, unique addresses, timing
- ▸829 missing values in ERC20 features → median imputation
Feature Engineering
- ▸Sent/received ratio, transaction frequency, average value per transaction
- ▸ERC20 activity aggregation (unique tokens, transaction velocity)
- ▸Log transforms on skewed amount distributions (56 features after engineering)
Stage 1 — Baseline
| Model | AUC | Notes |
|---|---|---|
| Logistic Regression | 0.8419 | Weak on behavioral patterns |
| Random Forest | 0.9973 | Already excellent |
Stage 2 — Advanced Pipeline
- ▸SMOTE oversampling → 50/50 balance (11,070 training samples)
- ▸Optuna HPO — XGBoost, 40 trials → Best CV AUC: 0.9992
- ▸Best: n_estimators=395, max_depth=3, lr=0.14, subsample=0.85
- ▸Train XGBoost + LightGBM + CatBoost
- ▸Stacking meta-learner (Logistic Regression)
- ▸Threshold tuning → maximize F1
Final Results
| Model | AUC | F1 (Fraud) |
|---|---|---|
| XGBoost | 0.9971 | 0.9659 |
| LightGBM | 0.9972 | 0.9569 |
| CatBoost | 0.9969 | 0.9584 |
| Stacking | 0.9973 | — |
Optimal threshold: 0.85 → F1: 0.9658
SHAP Top Fraud Indicators ERC20 sent count, unique address diversity, total ether received, timing irregularity, ERC20 token diversity