Achieving AUC 0.9648 on IEEE-CIS Fraud Detection with LightGBM Stacking
A complete walkthrough of building a stacking ensemble that achieved AUC 0.9648 on the IEEE-CIS fraud dataset — feature engineering, model selection, and meta-learner design.
Deep-dive articles on machine learning, AI engineering, and production ML systems
A complete walkthrough of building a stacking ensemble that achieved AUC 0.9648 on the IEEE-CIS fraud dataset — feature engineering, model selection, and meta-learner design.
The 15 feature engineering techniques I use in every Kaggle tabular competition — from target encoding to frequency encoding, lag features, and interaction terms.
A practical, benchmark-driven comparison of XGBoost and LightGBM across speed, accuracy, and memory — with concrete recommendations for tabular ML in production.
How CatBoost handles categorical features without data leakage using ordered target encoding — and why this gives it an edge on datasets with many categoricals.
After 20+ imbalanced classification projects — fraud, medical, churn — here is what actually moves the needle: SMOTE, class weights, threshold tuning, and cost-sensitive learning.
How to use Optuna for hyperparameter optimization beyond random search — pruning, multi-objective optimization, and persistent study databases.
A practical guide to SHAP values — global importance, local explanations, waterfall plots, and how to turn model explanations into business insights.
K-Fold, Stratified, GroupKFold, TimeSeriesSplit — a practical guide to choosing the right CV strategy based on your data structure.
The exact workflow I follow in every Kaggle competition — EDA, baseline, feature engineering sprints, ensemble building, and the final push before deadline.
When classical time series methods work and when ML wins — feature engineering for time series, backtesting frameworks, and handling seasonality in production.
A complete from-scratch DQN implementation in PyTorch — environment, replay buffer, epsilon-greedy exploration, and the training loop that actually converges.
How NEAT evolves both the weights and topology of neural networks — speciation, crossover, innovation numbers, and implementing it for game AI.
A clear explanation of MCTS — selection, expansion, simulation, backpropagation — with Python implementation for 2048 and game tree visualization.
Using genetic algorithms for feature selection, hyperparameter tuning, and scheduling — encoding strategies, selection methods, and convergence analysis.
Why you should wrap everything in an sklearn Pipeline — preventing data leakage, proper cross-validation, easy serialization, and custom transformers.
Using autoencoders for unsupervised anomaly detection — reconstruction error thresholding, LSTM autoencoders for time series, and production deployment.
A structured approach to ML system design interviews — problem framing, data strategy, modeling choices, serving infrastructure, and monitoring.
Matrix factorization, implicit feedback, and neural collaborative filtering — practical implementation and evaluation with RecSys metrics.
I build custom ML models, AI agents, computer vision, and automation — from idea to production.