All Projects

35+ ML projects across every domain — all production-grade, fully documented

42 projects total

IEEE-CIS Fraud Detection
Featured
Fraud Detection

IEEE-CIS Fraud Detection

Full ML pipeline on 590K transactions, 433 features. LightGBM AUC 0.9648 — stacking ensemble LGB+XGB+CatBoost+RF with advanced feature engineering on Vesta behavioral features.

AUC: 0.9648
LightGBMXGBoostCatBoostStackingFeature Engineering
Generative AIDeploymentFeatured

AI Image Generation Platform (Ofoto)

Production deployment of Stable Diffusion (Automatic1111 + ControlNet) with FastAPI backend, Vue.js frontend — 500+ concurrent requests, 99.9% uptime, -35% latency, -40% release time.

99.9% uptime, -35% latency
Stable DiffusionControlNetFastAPIVue.jsDocker
AI AgentsFeatured

WhatsApp AI Sales Agent

Production AI sales agent on WhatsApp Business. Classifies messages (Sales/Support/Off-topic), queries Supabase product DB, uses Ollama/Llama3.1 locally, bilingual FR/AR, conversation memory. -90% manual processing time.

-90% processing time
n8nLLMWhatsAppOllamaLlama3.1
Breast Cancer Ultrasound Segmentation
Featured
Medical AIComputer Vision

Breast Cancer Ultrasound Segmentation

9-architecture segmentation benchmark on 780 BUSI images. DeepLabV3+ tops with Dice 0.7863, IoU 0.6483. FCN → SimpleUNet → SegNet → Attention-UNet → TransUNet → ResNet34-UNet → EfficientNet-UNet → DeepLabV3+ → Swin-UNet.

Dice: 0.7863 | IoU: 0.6483
U-NetDeepLabV3+ASPPSegmentationPyTorch
Ethereum Blockchain Fraud Detection
Featured
Fraud Detection

Ethereum Blockchain Fraud Detection

Blockchain fraud detection on 9,841 Ethereum addresses. XGBoost+LightGBM+CatBoost+Stacking with Optuna HPO (40 trials) and SHAP. AUC 0.9973, F1 0.9658 at optimal threshold 0.85.

AUC: 0.9973 | F1: 0.9658
XGBoostLightGBMCatBoostSMOTEOptuna
English → French Neural Machine Translation
Featured
NLP

English → French Neural Machine Translation

Memory-safe NMT on a 6 GB dataset without RAM crashes. Custom Seq2Seq + HuggingFace mBART/Helsinki-NLP fine-tuning. Fixed 5 critical upstream bugs (GradientTape, tokenizer overflow, deprecated API).

Seq2SeqmBARTMarianMTHuggingFaceNMT
Twitter Sentiment Analysis
NLP

Twitter Sentiment Analysis

6-model NLP pipeline on 74K tweets. DistilBERT fine-tuning achieves 96.6% accuracy. LR+TF-IDF baseline at 85%. LSTM/Bi-LSTM/CNN reach 87–88%. 4-class: Positive, Negative, Neutral, Irrelevant.

96.6% accuracy (DistilBERT)
DistilBERTLSTMBi-LSTMTF-IDFSentiment
Fake News Detection
NLP

Fake News Detection

13-model NLP pipeline on 44,898 news articles. Soft Voting Ensemble & Stacking both achieve 99.86% accuracy, AUC=1.0. Only 2 errors on the full test set. DistilBERT matches at 99.87% on 6K subset.

99.86% accuracy | AUC: 1.0000
LinearSVCTF-IDFXGBoostLightGBMDistilBERT
Human Activity Recognition (HAR)
Computer Vision

Human Activity Recognition (HAR)

14-model benchmark on 9,299 UCI sensor readings. SVM Linear tops at 96.1%. t-SNE shows clean activity clusters. PCA retains 95% variance at ~95 components. Sitting/Standing confusion is the primary error source.

96.1% accuracy (SVM Linear)
SVMXGBoostLightGBMPCAt-SNE
Telco Customer Churn Prediction
Fraud Detection

Telco Customer Churn Prediction

3-phase churn pipeline on 7,043 customers. Optuna-tuned XGBoost: AUC 0.8484, F1 0.5947. Phase 1: 5 baselines → Phase 2: boosting ensembles → Phase 3: 100-trial Optuna + SHAP. Tenure & contract type dominate.

AUC: 0.8484 (Optuna XGBoost)
XGBoostLightGBMCatBoostOptunaSHAP
Vehicle Insurance Claim Fraud
Fraud Detection

Vehicle Insurance Claim Fraud

16-model fraud pipeline for 15,420 claims (5.99% fraud). AdaBoost maximizes recall (89.2%). RandomizedSearchCV XGBoost: CV AUC 0.9847. SHAP: Fault (37.9%) is the dominant fraud indicator.

AUC: 0.9847 (CV) | Recall: 89.2% (AdaBoost)
XGBoostSMOTESHAPInsuranceRandomizedSearchCV
Face Recognition Person Search
Computer Vision

Face Recognition Person Search

Zero-shot face recognition with pretrained dlib ResNet-50 (VGGFace2) embeddings. Searches 13,233 LFW images via 128-d Euclidean distance. 18/19 correct matches at tolerance 0.55. No training required.

18/19 matches — 94.7% recall at threshold 0.55
Face RecognitiondlibResNet-50VGGFace2LFW
Facial Emotion Recognition
Computer Vision

Facial Emotion Recognition

7-class emotion recognition on RAF-DB (12,271 images). Ensemble of ResNet50+ViT-Small+EfficientNetB3 achieves 86.57%. 2-phase transfer learning. GradCAM confirms focus on mouth/brow/eye regions per emotion.

86.57% accuracy (ensemble)
ResNet50ViT-SmallEfficientNetB3GradCAMRAF-DB
YOLOv8 Smart Parking Detection
Computer Vision

YOLOv8 Smart Parking Detection

Binary parking occupancy (free vs not-free) with YOLOv8n. Test mAP50=0.942, mAP50-95=0.798. Early stopped at epoch 74. 30 CVAT-annotated images (22/4/4). Inference: 9 free + 21 occupied per lot @ 41.2ms.

Test mAP50: 0.942 | Val mAP50: 0.994
YOLOv8Object DetectionCVATParkingReal-time
Cancer Detection — YOLOv8 (n/s/m)
Medical AIComputer Vision

Cancer Detection — YOLOv8 (n/s/m)

3-variant YOLOv8 benchmark for cancer localization. YOLOv8m: test mAP50=0.6782, Precision=0.7633, F1=0.6941. 1,968 training images. Exported ONNX (49.8 MB) + TorchScript (99.1 MB).

mAP50: 0.6782 (YOLOv8m test)
YOLOv8Object DetectionMedical ImagingONNXCancer
YOLOv8 Animals Detection
Computer Vision

YOLOv8 Animals Detection

80-class animal detection with YOLOv8n. mAP@0.5=0.668, mAP@0.5:0.95=0.560. Best: Tiger (0.967), Sparrow (0.953). Challenging: Squid (0.009). ONNX (12.3 MB). 29,071 images across 80 species.

mAP@0.5: 0.668 | Tiger: 0.967
YOLOv8Object Detection80-classWildlifeONNX
Plant Disease Classification
Computer Vision

Plant Disease Classification

15-class PlantVillage benchmark. MobileNetV2 individual best: 92.86%. Ensemble (MobileNetV2+EfficientNetB3+ResNet50) test: 83.43%. 42.5× class imbalance. Fixed generator-reset bug that caused ensemble collapse.

92.86% (MobileNetV2) | 83.43% (ensemble test)
MobileNetV2EfficientNetB3ResNet50EnsembleAgriculture
Butterfly Species Classification
Computer Vision

Butterfly Species Classification

4-phase multi-model pipeline for 75-species classification. Vanilla CNN → pretrained TL → hybrid parallel/sequential → multi-loss auxiliary heads. Grad-CAM confirms wing-pattern focus. t-SNE shows inter-species clustering.

CNNTransfer LearningMulti-lossGrad-CAMt-SNE
Chest CT Scan Cancer Classification
Medical AI

Chest CT Scan Cancer Classification

4-class lung cancer classification on 613 CT images. MobileNetV2 best: 66.03% test accuracy. 16 models: HOG+8 classical + custom CNNs + TL. MC-Dropout uncertainty flags cases for radiologist review.

66.03% test accuracy (MobileNetV2)
MobileNetV2CT ScanCancerMC-DropoutHOG
TACO Trash Detection & Segmentation
Computer Vision

TACO Trash Detection & Segmentation

5-model benchmark on 1,500 trash images (4,784 annotations, 60 categories). RT-DETR-L best: mAP50=0.2778, Precision=0.4833. Faster R-CNN loss converges 0.76→0.11. YOLOv8n/s/l + RT-DETR + Faster R-CNN.

RT-DETR-L mAP50: 0.2778
RT-DETRYOLOv8Faster R-CNNTACOEnvironmental AI
Sign Language Digits Classification
Computer Vision

Sign Language Digits Classification

CNN for sign language digit recognition (0–9) on 2,062 balanced images. 96.13% validation accuracy at epoch 23, train F1=0.98. 3-layer CNN with BatchNorm + Dropout. Exported to H5 for deployment.

96.13% validation accuracy
CNNSign LanguageAccessibilityBatchNormKeras
Breast Cancer Classification (Wisconsin)
Medical AI

Breast Cancer Classification (Wisconsin)

14-model benchmark on Wisconsin dataset (569 samples). Voting Ensemble: 99.12% accuracy. CatBoost: AUC 0.9990. Extra Trees: 98.25%. Tuned RF + SVM via RandomizedSearchCV/GridSearchCV. SHAP: concave_points_worst dominates.

99.12% (Voting) | AUC: 0.9990 (CatBoost)
CatBoostXGBoostLightGBMSHAPSVM
Book Recommender Systems — Full Taxonomy
NLP

Book Recommender Systems — Full Taxonomy

Complete recommender system taxonomy on BookCrossing (1.1M ratings): User-CF, Item-CF, SVD/NMF/ALS, Content-Based, Hybrid, NCF, AutoRec, GRU4Rec. User-CF RMSE 1.6645, P@10 0.6629, R@10 0.6910.

Collaborative FilteringSVDNCFGRU4RecMatrix Factorization
Hourly Energy Consumption Forecasting
Time Series

Hourly Energy Consumption Forecasting

10-model benchmark on 145,366 PJM hourly records (2002–2018). LightGBM best: MAE=210.8 MW, RMSE=285.4 MW, MAPE=0.66%. Prophet fails (MAPE=10.25%). BiLSTM MAPE=2.17%. 26 lag/rolling/cyclical features.

LightGBM RMSE: 285.4 MW | MAPE: 0.66%
LightGBMXGBoostBiLSTMProphetLag Features
EURUSD Forecasting — 30+ Models (Quantum · GNN · Diffusion · GA)
Time Series

EURUSD Forecasting — 30+ Models (Quantum · GNN · Diffusion · GA)

Most comprehensive EURUSD benchmark: 30+ models including Quantum ML (QSVM/QNN/QAE/VQC), Genetic Algorithms (7 variants + Neural Chromosomes), GNN, Neural SDE, Diffusion DDPM, Informer, PatchTST, TFT. Delta-target methodology. NSGA-2 multi-objective optimization.

30+ models | Quantum · GA · GNN · Diffusion
Genetic AlgorithmsQuantum MLGNNDiffusion DDPMNeural SDE
COVID-19 Outbreak Prediction
Time Series

COVID-19 Outbreak Prediction

Leakage-free pipeline on 188 daily records (Jan–Jul 2020). Target = daily new cases (stationary). Walk-forward TimeSeriesSplit CV. SEIR model + ARIMA + XGBoost + LSTM + Transformer. Fixes cumulative-count leakage from v1.

SEIRLSTMTransformerEpidemiologyWalk-forward
Weather Pattern Detection
Time Series

Weather Pattern Detection

9-method pipeline on 96,453 hourly records. K-Means (sil=0.45, K=3), DBSCAN, Isolation Forest (1,930 anomalies), LightGBM macro F1=0.74, 1D-CNN 94.85%, LSTM Autoencoder, Prophet (16 anomaly days).

LightGBM macro F1: 0.74 | 1D-CNN: 94.85% | IF: 1,930 anomalies
K-MeansDBSCANIsolation ForestLightGBM1D-CNN
DataCo Smart Supply Chain ML
Time Series

DataCo Smart Supply Chain ML

Leakage-free ML on 180,519 orders. LightGBM AUC 0.8563 (late delivery). Gradient Boosting R²=0.9996 (profit regression). Removed post-fulfillment columns that inflate to AUC=1.0 in most published solutions.

Classification AUC: 0.8563 | Regression R²: 0.9996
XGBoostLightGBMSupply ChainLeakage-FreeClassification
LinkedIn Job Postings ML Pipeline
NLP

LinkedIn Job Postings ML Pipeline

Full ML pipeline on 123,849 LinkedIn postings (2023–2024). Salary prediction, skills demand analysis (213K pairs), NLP on descriptions. 7 CSV files joined. Pay-period normalization (hourly→yearly).

NLPSalary PredictionXGBoostLightGBMLabor Market
Advanced Game Playing — Deep RL
Reinforcement Learning

Advanced Game Playing — Deep RL

Double Dueling DQN + PER (SumTree). CartPole-v1 solved ep 300 (MA-100=441.1, best eval 497.2/500). LunarLander-v3 solved ep 207 (MA-100=202). 134,275-param network with LayerNorm.

CartPole solved ep 300 | LunarLander solved ep 207
Double DQNDueling DQNPERSumTreeCartPole
IoT Network Security Anomaly Detection
Fraud Detection

IoT Network Security Anomaly Detection

Embedded system intrusion detection with extreme imbalance (10% anomalies). BiLSTM+Attention: PR-AUC=0.186, Recall=33.3%. 5× augmentation (Gaussian/MixUp/masking). MC-Dropout uncertainty. Focal loss.

BiLSTM PR-AUC: 0.186 | Recall: 33.3%
BiLSTMAnomaly DetectionIoTFocal LossMC-Dropout
Poetry Generation — BERT / GPT-2 / T5 Fine-tuned
NLPGenerative AI

Poetry Generation — BERT / GPT-2 / T5 Fine-tuned

Fine-tuned BERT, GPT-2, and T5 on the Poetry Foundation corpus for creative poem generation. 10 saved checkpoints. Vocabulary diversity analysis per poet. Beam search + temperature sampling. Model dashboard comparing all 3 architectures.

GPT-2BERTT5Fine-tuningPoetry
Handwritten Name Recognition
Computer VisionNLP

Handwritten Name Recognition

OCR benchmark on 66K images. TrOCR (ViT+GPT-2, 334M) best: CER=0.0481, 80% exact-match. CRNN-ResNet34: CER=0.0502. AMP + torch.compile + gradient accumulation optimizations.

TrOCR CER=0.0481 | 80% exact-match
TrOCRCRNNCTC LossBiLSTMResNet34
Food Delivery Time Prediction
Time Series

Food Delivery Time Prediction

16-model regression benchmark. Linear Regression surprisingly wins: RMSE=8.76 min, R²=0.829. Tuned XGBoost: RMSE=9.19. Distance & traffic dominate. Interaction features (distance×traffic) capture non-linearities for linear models.

Linear Regression RMSE: 8.76 min | R²: 0.829
RegressionXGBoostLightGBMFeature EngineeringFood Delivery
Household Power Consumption Forecasting
Time Series

Household Power Consumption Forecasting

Multi-model time series on 2.9M UCI records (2006–2010). ARIMA, SARIMA, Prophet, LSTM on Global_active_power. STL reveals daily+weekly patterns. Ensemble with inverse-RMSE weighting across all models.

LSTMARIMASARIMAProphetSTL Decomposition
Historical Product Demand Forecasting
Time Series

Historical Product Demand Forecasting

19-model benchmark: classical TS → ML → DL → ensemble. CatBoost R²=0.7125 (best). ML crushes classical TS (SMAPE 115–130% vs 35–40% for TS, but R² negative for TS). Walk-forward CV with Optuna.

CatBoost R²=0.7125 | Quantile Reg MAE=8,511
CatBoostXGBoostLightGBMLSTMTFT
Synthetic Speech Commands Classification
NLP

Synthetic Speech Commands Classification

30-class audio CNN achieves perfect 100% test accuracy on 41,849 samples. Mel-spectrogram (64 bins) + SpecAugment. 1.25M-param 4-block CNN. Val accuracy reaches 100% at epoch 8. Label smoothing 0.1.

100% test accuracy | F1=1.00 all 30 classes
Audio CNNMel-SpectrogramSpecAugmentSpeech Recognition30-class
Line Detection (Computer Vision)
Computer Vision

Line Detection (Computer Vision)

Classical CV benchmark: Standard Hough (2.53ms, 22 lines), Probabilistic Hough (4.29ms, 47 segments), LSD (23.98ms, 422 segments). Hough 6–10× faster. Udacity dashcam + synthetic images. HSV+ROI pipeline.

Standard Hough: 2.53ms/frame (270+ FPS)
Hough TransformLSDCanny EdgeLane DetectionOpenCV
Generative AI

Anime Face Generation (DCGAN)

DCGAN trained 100 epochs on Tesla T4 on 43K anime images. ConvTranspose2d stack (100→512→256→128→64→3). β₁=0.5, label smoothing, StepLR. Slerp latent interpolation for smooth transitions.

Stable generation after 100 epochs on 43K images
DCGANGANPyTorchGenerative AISlerp
AI AgentsBackend

E-commerce Recommendation Engine (n8n)

Production recommendation backend: n8n + PostgreSQL, 4 modes (trending/co-purchase/personalized/repurchase), 74 nodes, webhook API, daily scheduler. No custom server required.

n8nPostgreSQLMarket BasketRecommendationWebhooks
AI Agents

RAG Multi-Agent System (n8n + Pinecone)

109-node n8n: Google Drive PDF → Pinecone vector store → Cohere embeddings → Ollama AI Agent → Airtop browser scraping → Apify actors. 5 sub-workflows. Full RAG + conversation memory.

RAGPineconen8nCohereOllama
BackendDeployment

Microservices Architecture (Spring Boot)

Production microservices: Spring Boot, Apache Kafka event streaming, OAuth2/Keycloak auth, gRPC inter-service calls, API gateway, Docker. Event-driven design with per-service PostgreSQL isolation.

Spring BootKafkaKeycloakgRPCDocker

Need an AI engineer or data scientist?

I build custom ML models, AI agents, computer vision, and automation — from idea to production.