Blog & Insights

Deep-dive articles on machine learning, AI engineering, and production ML systems

Featured Articles

All Articles

Machine Learning6 min read

XGBoost vs LightGBM: When to Use Each in Production

A practical, benchmark-driven comparison of XGBoost and LightGBM across speed, accuracy, and memory — with concrete recommendations for tabular ML in production.

XGBoostLightGBMGradient BoostingBenchmarks
March 20, 2025
Machine Learning5 min read

CatBoost's Secret Weapon: Ordered Target Encoding Explained

How CatBoost handles categorical features without data leakage using ordered target encoding — and why this gives it an edge on datasets with many categoricals.

CatBoostCategorical FeaturesTarget EncodingGradient Boosting
February 18, 2025
Machine Learning7 min read

Class Imbalance in Production: What Actually Works

After 20+ imbalanced classification projects — fraud, medical, churn — here is what actually moves the needle: SMOTE, class weights, threshold tuning, and cost-sensitive learning.

Class ImbalanceSMOTEFraud DetectionClassification
February 1, 2025
Machine Learning6 min read

Optuna in Production: Smarter Hyperparameter Tuning

How to use Optuna for hyperparameter optimization beyond random search — pruning, multi-objective optimization, and persistent study databases.

OptunaHyperparameter TuningBayesian OptimizationLightGBM
January 22, 2025
Machine Learning7 min read

SHAP for Production ML: Explaining Models to Non-Technical Stakeholders

A practical guide to SHAP values — global importance, local explanations, waterfall plots, and how to turn model explanations into business insights.

SHAPExplainabilityXAIFeature Importance
January 10, 2025
Machine Learning5 min read

Cross-Validation Strategies: Which One to Use and When

K-Fold, Stratified, GroupKFold, TimeSeriesSplit — a practical guide to choosing the right CV strategy based on your data structure.

Cross-ValidationModel EvaluationTime SeriesKaggle
December 15, 2024
Machine Learning9 min read

My Kaggle Competition Strategy: From Bronze to Gold

The exact workflow I follow in every Kaggle competition — EDA, baseline, feature engineering sprints, ensemble building, and the final push before deadline.

KaggleCompetitionStrategyEnsemble
February 20, 2025
Machine Learning9 min read

Time Series Forecasting at Scale: From ARIMA to LightGBM

When classical time series methods work and when ML wins — feature engineering for time series, backtesting frameworks, and handling seasonality in production.

Time SeriesForecastingLightGBMProphet
January 5, 2025
Machine Learning13 min read

DQN from Scratch: Teaching an Agent to Play Snake

A complete from-scratch DQN implementation in PyTorch — environment, replay buffer, epsilon-greedy exploration, and the training loop that actually converges.

Reinforcement LearningDQNPyTorchGame AI
January 18, 2025
Machine Learning9 min read

NEAT Algorithm: Evolving Neural Networks Without Backprop

How NEAT evolves both the weights and topology of neural networks — speciation, crossover, innovation numbers, and implementing it for game AI.

NEATNeuroevolutionGenetic AlgorithmGame AI
November 28, 2024
Machine Learning10 min read

Monte Carlo Tree Search: The Algorithm Behind AlphaGo

A clear explanation of MCTS — selection, expansion, simulation, backpropagation — with Python implementation for 2048 and game tree visualization.

MCTSGame AIAlphaGoTree Search
November 10, 2024
Machine Learning8 min read

Genetic Algorithms for Real-World Optimization Problems

Using genetic algorithms for feature selection, hyperparameter tuning, and scheduling — encoding strategies, selection methods, and convergence analysis.

Genetic AlgorithmOptimizationFeature SelectionEvolutionary Computing
October 25, 2024
Machine Learning6 min read

Scikit-learn Pipelines: The Right Way to Build ML Workflows

Why you should wrap everything in an sklearn Pipeline — preventing data leakage, proper cross-validation, easy serialization, and custom transformers.

Scikit-learnPipelineData LeakageBest Practices
November 5, 2024
Machine Learning8 min read

Anomaly Detection with Autoencoders: Better Than Rules, Cheaper Than Labels

Using autoencoders for unsupervised anomaly detection — reconstruction error thresholding, LSTM autoencoders for time series, and production deployment.

Anomaly DetectionAutoencoderUnsupervisedPyTorch
December 12, 2024
Machine Learning10 min read

ML System Design Interview: A Framework That Works

A structured approach to ML system design interviews — problem framing, data strategy, modeling choices, serving infrastructure, and monitoring.

System DesignML InterviewArchitectureProduction
February 5, 2025
Machine Learning10 min read

Building a Recommendation System: From Collaborative Filtering to Neural CF

Matrix factorization, implicit feedback, and neural collaborative filtering — practical implementation and evaluation with RecSys metrics.

Recommendation SystemCollaborative FilteringMatrix FactorizationPyTorch
October 10, 2024

Need an AI engineer or data scientist?

I build custom ML models, AI agents, computer vision, and automation — from idea to production.