All Projects
35+ ML projects across every domain — all production-grade, fully documented
42 projects total
FeaturedIEEE-CIS Fraud Detection
Full ML pipeline on 590K transactions, 433 features. LightGBM AUC 0.9648 — stacking ensemble LGB+XGB+CatBoost+RF with advanced feature engineering on Vesta behavioral features.
AI Image Generation Platform (Ofoto)
Production deployment of Stable Diffusion (Automatic1111 + ControlNet) with FastAPI backend, Vue.js frontend — 500+ concurrent requests, 99.9% uptime, -35% latency, -40% release time.
WhatsApp AI Sales Agent
Production AI sales agent on WhatsApp Business. Classifies messages (Sales/Support/Off-topic), queries Supabase product DB, uses Ollama/Llama3.1 locally, bilingual FR/AR, conversation memory. -90% manual processing time.
FeaturedBreast Cancer Ultrasound Segmentation
9-architecture segmentation benchmark on 780 BUSI images. DeepLabV3+ tops with Dice 0.7863, IoU 0.6483. FCN → SimpleUNet → SegNet → Attention-UNet → TransUNet → ResNet34-UNet → EfficientNet-UNet → DeepLabV3+ → Swin-UNet.
FeaturedEthereum Blockchain Fraud Detection
Blockchain fraud detection on 9,841 Ethereum addresses. XGBoost+LightGBM+CatBoost+Stacking with Optuna HPO (40 trials) and SHAP. AUC 0.9973, F1 0.9658 at optimal threshold 0.85.
FeaturedEnglish → French Neural Machine Translation
Memory-safe NMT on a 6 GB dataset without RAM crashes. Custom Seq2Seq + HuggingFace mBART/Helsinki-NLP fine-tuning. Fixed 5 critical upstream bugs (GradientTape, tokenizer overflow, deprecated API).

Twitter Sentiment Analysis
6-model NLP pipeline on 74K tweets. DistilBERT fine-tuning achieves 96.6% accuracy. LR+TF-IDF baseline at 85%. LSTM/Bi-LSTM/CNN reach 87–88%. 4-class: Positive, Negative, Neutral, Irrelevant.

Fake News Detection
13-model NLP pipeline on 44,898 news articles. Soft Voting Ensemble & Stacking both achieve 99.86% accuracy, AUC=1.0. Only 2 errors on the full test set. DistilBERT matches at 99.87% on 6K subset.

Human Activity Recognition (HAR)
14-model benchmark on 9,299 UCI sensor readings. SVM Linear tops at 96.1%. t-SNE shows clean activity clusters. PCA retains 95% variance at ~95 components. Sitting/Standing confusion is the primary error source.

Telco Customer Churn Prediction
3-phase churn pipeline on 7,043 customers. Optuna-tuned XGBoost: AUC 0.8484, F1 0.5947. Phase 1: 5 baselines → Phase 2: boosting ensembles → Phase 3: 100-trial Optuna + SHAP. Tenure & contract type dominate.

Vehicle Insurance Claim Fraud
16-model fraud pipeline for 15,420 claims (5.99% fraud). AdaBoost maximizes recall (89.2%). RandomizedSearchCV XGBoost: CV AUC 0.9847. SHAP: Fault (37.9%) is the dominant fraud indicator.

Face Recognition Person Search
Zero-shot face recognition with pretrained dlib ResNet-50 (VGGFace2) embeddings. Searches 13,233 LFW images via 128-d Euclidean distance. 18/19 correct matches at tolerance 0.55. No training required.

Facial Emotion Recognition
7-class emotion recognition on RAF-DB (12,271 images). Ensemble of ResNet50+ViT-Small+EfficientNetB3 achieves 86.57%. 2-phase transfer learning. GradCAM confirms focus on mouth/brow/eye regions per emotion.

YOLOv8 Smart Parking Detection
Binary parking occupancy (free vs not-free) with YOLOv8n. Test mAP50=0.942, mAP50-95=0.798. Early stopped at epoch 74. 30 CVAT-annotated images (22/4/4). Inference: 9 free + 21 occupied per lot @ 41.2ms.

Cancer Detection — YOLOv8 (n/s/m)
3-variant YOLOv8 benchmark for cancer localization. YOLOv8m: test mAP50=0.6782, Precision=0.7633, F1=0.6941. 1,968 training images. Exported ONNX (49.8 MB) + TorchScript (99.1 MB).

YOLOv8 Animals Detection
80-class animal detection with YOLOv8n. mAP@0.5=0.668, mAP@0.5:0.95=0.560. Best: Tiger (0.967), Sparrow (0.953). Challenging: Squid (0.009). ONNX (12.3 MB). 29,071 images across 80 species.

Plant Disease Classification
15-class PlantVillage benchmark. MobileNetV2 individual best: 92.86%. Ensemble (MobileNetV2+EfficientNetB3+ResNet50) test: 83.43%. 42.5× class imbalance. Fixed generator-reset bug that caused ensemble collapse.

Butterfly Species Classification
4-phase multi-model pipeline for 75-species classification. Vanilla CNN → pretrained TL → hybrid parallel/sequential → multi-loss auxiliary heads. Grad-CAM confirms wing-pattern focus. t-SNE shows inter-species clustering.

Chest CT Scan Cancer Classification
4-class lung cancer classification on 613 CT images. MobileNetV2 best: 66.03% test accuracy. 16 models: HOG+8 classical + custom CNNs + TL. MC-Dropout uncertainty flags cases for radiologist review.

TACO Trash Detection & Segmentation
5-model benchmark on 1,500 trash images (4,784 annotations, 60 categories). RT-DETR-L best: mAP50=0.2778, Precision=0.4833. Faster R-CNN loss converges 0.76→0.11. YOLOv8n/s/l + RT-DETR + Faster R-CNN.

Sign Language Digits Classification
CNN for sign language digit recognition (0–9) on 2,062 balanced images. 96.13% validation accuracy at epoch 23, train F1=0.98. 3-layer CNN with BatchNorm + Dropout. Exported to H5 for deployment.

Breast Cancer Classification (Wisconsin)
14-model benchmark on Wisconsin dataset (569 samples). Voting Ensemble: 99.12% accuracy. CatBoost: AUC 0.9990. Extra Trees: 98.25%. Tuned RF + SVM via RandomizedSearchCV/GridSearchCV. SHAP: concave_points_worst dominates.

Book Recommender Systems — Full Taxonomy
Complete recommender system taxonomy on BookCrossing (1.1M ratings): User-CF, Item-CF, SVD/NMF/ALS, Content-Based, Hybrid, NCF, AutoRec, GRU4Rec. User-CF RMSE 1.6645, P@10 0.6629, R@10 0.6910.

Hourly Energy Consumption Forecasting
10-model benchmark on 145,366 PJM hourly records (2002–2018). LightGBM best: MAE=210.8 MW, RMSE=285.4 MW, MAPE=0.66%. Prophet fails (MAPE=10.25%). BiLSTM MAPE=2.17%. 26 lag/rolling/cyclical features.

EURUSD Forecasting — 30+ Models (Quantum · GNN · Diffusion · GA)
Most comprehensive EURUSD benchmark: 30+ models including Quantum ML (QSVM/QNN/QAE/VQC), Genetic Algorithms (7 variants + Neural Chromosomes), GNN, Neural SDE, Diffusion DDPM, Informer, PatchTST, TFT. Delta-target methodology. NSGA-2 multi-objective optimization.

COVID-19 Outbreak Prediction
Leakage-free pipeline on 188 daily records (Jan–Jul 2020). Target = daily new cases (stationary). Walk-forward TimeSeriesSplit CV. SEIR model + ARIMA + XGBoost + LSTM + Transformer. Fixes cumulative-count leakage from v1.

Weather Pattern Detection
9-method pipeline on 96,453 hourly records. K-Means (sil=0.45, K=3), DBSCAN, Isolation Forest (1,930 anomalies), LightGBM macro F1=0.74, 1D-CNN 94.85%, LSTM Autoencoder, Prophet (16 anomaly days).

DataCo Smart Supply Chain ML
Leakage-free ML on 180,519 orders. LightGBM AUC 0.8563 (late delivery). Gradient Boosting R²=0.9996 (profit regression). Removed post-fulfillment columns that inflate to AUC=1.0 in most published solutions.

LinkedIn Job Postings ML Pipeline
Full ML pipeline on 123,849 LinkedIn postings (2023–2024). Salary prediction, skills demand analysis (213K pairs), NLP on descriptions. 7 CSV files joined. Pay-period normalization (hourly→yearly).

Advanced Game Playing — Deep RL
Double Dueling DQN + PER (SumTree). CartPole-v1 solved ep 300 (MA-100=441.1, best eval 497.2/500). LunarLander-v3 solved ep 207 (MA-100=202). 134,275-param network with LayerNorm.

IoT Network Security Anomaly Detection
Embedded system intrusion detection with extreme imbalance (10% anomalies). BiLSTM+Attention: PR-AUC=0.186, Recall=33.3%. 5× augmentation (Gaussian/MixUp/masking). MC-Dropout uncertainty. Focal loss.

Poetry Generation — BERT / GPT-2 / T5 Fine-tuned
Fine-tuned BERT, GPT-2, and T5 on the Poetry Foundation corpus for creative poem generation. 10 saved checkpoints. Vocabulary diversity analysis per poet. Beam search + temperature sampling. Model dashboard comparing all 3 architectures.

Handwritten Name Recognition
OCR benchmark on 66K images. TrOCR (ViT+GPT-2, 334M) best: CER=0.0481, 80% exact-match. CRNN-ResNet34: CER=0.0502. AMP + torch.compile + gradient accumulation optimizations.

Food Delivery Time Prediction
16-model regression benchmark. Linear Regression surprisingly wins: RMSE=8.76 min, R²=0.829. Tuned XGBoost: RMSE=9.19. Distance & traffic dominate. Interaction features (distance×traffic) capture non-linearities for linear models.

Household Power Consumption Forecasting
Multi-model time series on 2.9M UCI records (2006–2010). ARIMA, SARIMA, Prophet, LSTM on Global_active_power. STL reveals daily+weekly patterns. Ensemble with inverse-RMSE weighting across all models.

Historical Product Demand Forecasting
19-model benchmark: classical TS → ML → DL → ensemble. CatBoost R²=0.7125 (best). ML crushes classical TS (SMAPE 115–130% vs 35–40% for TS, but R² negative for TS). Walk-forward CV with Optuna.

Synthetic Speech Commands Classification
30-class audio CNN achieves perfect 100% test accuracy on 41,849 samples. Mel-spectrogram (64 bins) + SpecAugment. 1.25M-param 4-block CNN. Val accuracy reaches 100% at epoch 8. Label smoothing 0.1.

Line Detection (Computer Vision)
Classical CV benchmark: Standard Hough (2.53ms, 22 lines), Probabilistic Hough (4.29ms, 47 segments), LSD (23.98ms, 422 segments). Hough 6–10× faster. Udacity dashcam + synthetic images. HSV+ROI pipeline.
Anime Face Generation (DCGAN)
DCGAN trained 100 epochs on Tesla T4 on 43K anime images. ConvTranspose2d stack (100→512→256→128→64→3). β₁=0.5, label smoothing, StepLR. Slerp latent interpolation for smooth transitions.
E-commerce Recommendation Engine (n8n)
Production recommendation backend: n8n + PostgreSQL, 4 modes (trending/co-purchase/personalized/repurchase), 74 nodes, webhook API, daily scheduler. No custom server required.
RAG Multi-Agent System (n8n + Pinecone)
109-node n8n: Google Drive PDF → Pinecone vector store → Cohere embeddings → Ollama AI Agent → Airtop browser scraping → Apify actors. 5 sub-workflows. Full RAG + conversation memory.
Microservices Architecture (Spring Boot)
Production microservices: Spring Boot, Apache Kafka event streaming, OAuth2/Keycloak auth, gRPC inter-service calls, API gateway, Docker. Event-driven design with per-service PostgreSQL isolation.
Need an AI engineer or data scientist?
I build custom ML models, AI agents, computer vision, and automation — from idea to production.