All Projects
Medical AI
Breast Cancer Classification (Wisconsin)
14-model benchmark on Wisconsin dataset (569 samples). Voting Ensemble: 99.12% accuracy. CatBoost: AUC 0.9990. Extra Trees: 98.25%. Tuned RF + SVM via RandomizedSearchCV/GridSearchCV. SHAP: concave_points_worst dominates.
99.12%
Voting Ensemble Acc
0.9990
CatBoost AUC-ROC
98.25%
Extra Trees / Tuned SVM
14
Models benchmarked
Dataset
Wisconsin Breast Cancer: 569 samples, 30 features, 2 classes
Approach
14-model benchmark → RandomizedSearchCV/GridSearchCV HPO → SHAP interpretability
Tech Stack
PythonCatBoostXGBoostLightGBMScikit-learnSHAP
Keywords
CatBoostXGBoostLightGBMSHAPSVMExtra TreesHealthcare
Visualizations6 Charts
Deep Dive
Comprehensive ML pipeline for breast cancer binary classification on the Wisconsin Diagnostic dataset.
Dataset
- ▸569 samples: 357 benign (62.7%) + 212 malignant (37.3%)
- ▸30 features: 10 measurements × 3 statistics (mean, SE, worst)
- ▸radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, fractal dimension
- ▸No missing values. Stratified 80/20 split (455 train / 114 test)
- ▸6 engineered features: density proxies, shape ratios, worst/mean progression
Full 14-Model Benchmark
| Model | Accuracy | AUC-ROC |
|---|---|---|
| Naive Bayes | 92.11% | 0.9891 |
| Decision Tree | 92.11% | 0.9448 |
| KNN (k=5) | 95.61% | 0.9823 |
| Gradient Boosting | 95.61% | 0.9970 |
| LDA | 96.49% | 0.9970 |
| Logistic Regression | 96.49% | 0.9960 |
| XGBoost | 96.49% | 0.9954 |
| LightGBM | 96.49% | 0.9970 |
| CatBoost | 96.49% | 0.9990 |
| AdaBoost | 97.37% | 0.9861 |
| SVM (RBF) | 97.37% | 0.9947 |
| Random Forest | 97.37% | 0.9944 |
| Stacking | 97.37% | 0.9950 |
| Extra Trees | 98.25% | 0.9987 |
| Tuned SVM | 98.25% | 0.9960 |
| Voting Ensemble | 99.12% | 0.9950 |
Hyperparameter Tuning
- ▸RF (RandomizedSearchCV, 40 trials): n_estimators=500, no depth limit, log2 features
- ▸SVM (GridSearchCV): C=10, gamma=0.01, RBF kernel → 98.25% accuracy
SHAP Top Malignancy Indicators
- ▸
concave_points_worst— dominant discriminator - ▸
perimeter_worst— boundary irregularity - ▸
area_worst— worst-cell size - ▸
radius_worst— largest cell radius
Clinical Focus Optimized for recall (sensitivity) — a missed malignant diagnosis (false negative) is far more dangerous than a false positive in clinical screening.