All Projects
Medical AI

Breast Cancer Classification (Wisconsin)

14-model benchmark on Wisconsin dataset (569 samples). Voting Ensemble: 99.12% accuracy. CatBoost: AUC 0.9990. Extra Trees: 98.25%. Tuned RF + SVM via RandomizedSearchCV/GridSearchCV. SHAP: concave_points_worst dominates.

99.12%
Voting Ensemble Acc
0.9990
CatBoost AUC-ROC
98.25%
Extra Trees / Tuned SVM
14
Models benchmarked
Dataset

Wisconsin Breast Cancer: 569 samples, 30 features, 2 classes

Approach

14-model benchmark → RandomizedSearchCV/GridSearchCV HPO → SHAP interpretability

Tech Stack
PythonCatBoostXGBoostLightGBMScikit-learnSHAP
Keywords
CatBoostXGBoostLightGBMSHAPSVMExtra TreesHealthcare
Visualizations6 Charts
Deep Dive

Comprehensive ML pipeline for breast cancer binary classification on the Wisconsin Diagnostic dataset.

Dataset

  • 569 samples: 357 benign (62.7%) + 212 malignant (37.3%)
  • 30 features: 10 measurements × 3 statistics (mean, SE, worst)
    • radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, fractal dimension
  • No missing values. Stratified 80/20 split (455 train / 114 test)
  • 6 engineered features: density proxies, shape ratios, worst/mean progression

Full 14-Model Benchmark

ModelAccuracyAUC-ROC
Naive Bayes92.11%0.9891
Decision Tree92.11%0.9448
KNN (k=5)95.61%0.9823
Gradient Boosting95.61%0.9970
LDA96.49%0.9970
Logistic Regression96.49%0.9960
XGBoost96.49%0.9954
LightGBM96.49%0.9970
CatBoost96.49%0.9990
AdaBoost97.37%0.9861
SVM (RBF)97.37%0.9947
Random Forest97.37%0.9944
Stacking97.37%0.9950
Extra Trees98.25%0.9987
Tuned SVM98.25%0.9960
Voting Ensemble99.12%0.9950

Hyperparameter Tuning

  • RF (RandomizedSearchCV, 40 trials): n_estimators=500, no depth limit, log2 features
  • SVM (GridSearchCV): C=10, gamma=0.01, RBF kernel → 98.25% accuracy

SHAP Top Malignancy Indicators

  1. concave_points_worst — dominant discriminator
  2. perimeter_worst — boundary irregularity
  3. area_worst — worst-cell size
  4. radius_worst — largest cell radius

Clinical Focus Optimized for recall (sensitivity) — a missed malignant diagnosis (false negative) is far more dangerous than a false positive in clinical screening.