ML Learning Hub
Visual explanations of machine learning concepts — hand-crafted diagram series
Setup & Tooling
Your Python ML environment
Python ML Stack: NumPy, Pandas & Matplotlib
Master the tools every ML engineer uses daily — NumPy vectorized operations, Pandas DataFrames for real-world data, and Matplotlib/Seaborn for exploratory visualization. The foundation everything else builds on.
Math Foundations
Linear algebra, calculus, probability & information theory
Linear Algebra for ML
Vectors, dot products, matrix multiplication, eigendecomposition and SVD — with visual intuition for how matrices transform space. The language every neural network is written in.
Calculus & Optimization
Derivatives, partial derivatives, the chain rule (= backpropagation), and gradient descent. Then Adam, momentum, learning rate scheduling — the full story of how neural networks actually learn.
Probability & Statistics
Probability distributions (Normal, Binomial, Poisson), MLE, Bayes' theorem, hypothesis testing, and the Central Limit Theorem — the language of uncertainty that underlies every loss function and evaluation metric.
Information Theory
Entropy, cross-entropy loss, KL divergence, and mutual information — the mathematical backbone behind why cross-entropy works as a loss function, how VAEs work, and why transformers use attention.
Classic ML
Supervised, unsupervised, ensemble — the full sklearn toolkit
Linear & Logistic Regression
Visual deep-dive from OLS to gradient descent, R², residuals, multicollinearity, then logistic: sigmoid, log loss, L1/L2 regularization, and decision boundaries.
Model Evaluation & Metrics
Complete guide: accuracy, precision, recall, F1, ROC-AUC, confusion matrix, PR curves, cross-validation (StratifiedKFold, TimeSeriesSplit), and choosing the right metric for your task.
Bias-Variance Tradeoff & Error Analysis
Visual intuition for underfitting vs overfitting, bias-variance decomposition, learning curves, and systematic error analysis — how to diagnose what's wrong with your model.
Feature Engineering & Pipelines
The full preprocessing pipeline: imputation (Simple, MICE), categorical encoding (OHE, Target, Ordinal), scaling (Standard, MinMax, Robust), feature creation (polynomial, interactions, log transforms), and sklearn Pipelines for leakage-free evaluation.
Naïve Bayes Classifiers
Bayes' theorem, the conditional independence assumption, Gaussian/Multinomial/Complement/Bernoulli variants, Laplace smoothing, text classification with TF-IDF, and probability calibration — with interactive posterior probability demo.
Decision Trees & Random Forest
How decision trees split data (Gini, entropy, information gain), pruning, then Random Forest as bagged ensemble — variance reduction, feature importance, OOB evaluation.
SVM, SVR & KNN
Support Vector Machines: maximum margin hyperplane, kernel trick (RBF, polynomial), SVR for regression with ε-tube. KNN: distance metrics, k choice, curse of dimensionality.
Clustering: K-Means & DBSCAN
Unsupervised grouping of unlabeled data — K-Means (Lloyd's algorithm, inertia, K selection via elbow/silhouette), DBSCAN (core/border/noise points, ε-neighborhood, arbitrary shapes), and hierarchical clustering.
PCA & Dimensionality Reduction
Principal Component Analysis from scratch: eigendecomposition of the covariance matrix, variance explained, choosing k components, whitening, t-SNE/UMAP contrast, and practical applications in visualization and preprocessing.
Anomaly & Outlier Detection
Statistical (Z-Score, IQR fences) and algorithmic (Isolation Forest, LOF, One-Class SVM) approaches to finding rare abnormal observations — fraud detection, manufacturing defects, network intrusion.
Gradient Boosting: XGBoost, LightGBM, CatBoost
From vanilla Gradient Boosting to XGBoost (tree scores), then LightGBM (histogram-based, leaf-wise growth), and CatBoost (ordered boosting for categoricals). Optuna HPO patterns.
Bagging, Boosting & Stacking
Visual explanation of all ensemble paradigms — how bagging reduces variance (Random Forest), boosting reduces bias (AdaBoost.R2, SAMME), and stacking combines predictions via meta-learner.
OvA vs OvO Multi-class Classification
One-vs-All and One-vs-One strategies for extending binary classifiers to multi-class — decision boundaries, scalability, SVM applications, and when to use Softmax instead.
Hyperparameter Tuning
Grid Search (exhaustive), Random Search (surprisingly effective), Bayesian Optimisation (TPE/GP-based sequential search), Successive Halving, and Optuna — with interactive accuracy heatmap showing C × max_depth search space.
Feature Importance & Selection
Permutation importance vs impurity (Gini) importance, SHAP unified attribution, drop-column importance, and how correlated features split scores unfairly — with interactive bar chart toggling between methods.
Partial Dependence & ICE Plots
PDPs marginalize over all other features to show the average effect of one variable. ICE curves expose per-sample heterogeneity. Centered ICE removes intercept bias. ALE plots fix PDPs extrapolation problem for correlated features.
Time Series Forecasting
Trend-seasonal-residual decomposition, lag features, rolling statistics, Fourier seasonality, TimeSeriesSplit cross-validation, ARIMA intuition, and gradient boosting for tabular forecasting — with animated decomposition and 3-step forecast.
Deep Learning Core
Neural networks, CNNs, RNNs and training techniques
Neural Networks — Forward & Backpropagation
From single perceptron to multi-layer networks: forward pass, activation functions (ReLU/sigmoid/tanh), backpropagation derivation, vanishing gradients, and weight initialization strategies.
Deep Learning Optimization
SGD vs Momentum vs Adam vs AdamW, learning rate warmup and cosine scheduling, batch normalization, dropout, gradient clipping, and mixed-precision training — the recipe that makes modern deep networks train stably.
CNN Architectures: Classic → ResNet → ViT
Convolutional networks from scratch: convolution op, pooling, receptive field, then classic (LeNet/VGG), Inception, ResNet (skip connections), and Vision Transformer (ViT, patch embeddings).
RNN, LSTM & GRU — Sequence Modeling
Recurrent networks for sequences: vanilla RNN (BPTT, exploding/vanishing gradients), LSTM (forget/input/output gates, cell state), GRU (simplified gating), Bi-LSTM for bidirectional context.
Computer Vision
Detection, segmentation and visual understanding
Object Detection: YOLO & Faster-RCNN
From sliding windows to single-shot detectors — IoU, anchor boxes, NMS, mAP, and the two-stage vs one-stage architecture trade-off. How YOLO detects 80 object categories in real-time at 30 FPS.
Image Segmentation: UNet & DeepLab
Pixel-level classification — semantic vs instance vs panoptic segmentation, skip connections in UNet, dilated convolutions in DeepLab, Dice loss for class imbalance, and applications in medical imaging and autonomous driving.
NLP & Transformers
Text, attention mechanisms and large language models
NLP: Text Classification Pipeline
The full classical NLP pipeline: tokenization, TF-IDF vectorization, Naïve Bayes/Logistic/SVM classification, evaluation (macro-F1), word embeddings vs TF-IDF, and sentence-transformers for semantic search.
Transformers & Self-Attention
Deep dive into attention mechanisms: scaled dot-product, multi-head attention, positional encoding, feed-forward sublayer, temperature/top-k/top-p sampling, BERT encoder vs GPT decoder.
Audio & Speech
Spectrograms, ASR and audio classification
Audio & Speech ML
From raw waveforms to MFCC features — STFT spectrograms, Mel filterbanks, audio classification CNNs, CTC loss for speech recognition, SpecAugment, and OpenAI Whisper for production-grade ASR.
Generative AI
VAEs, GANs and diffusion-style generation
Generative Models: VAE & GAN
Autoencoder → Variational Autoencoder (ELBO, reparameterization trick, latent space interpolation) → GAN (generator/discriminator adversarial training, DCGAN, mode collapse solutions).
Reinforcement Learning
MDPs, Q-learning, policy gradients and PPO
Reinforcement Learning
MDP formalism, Bellman equations, Q-learning, Deep Q-Networks (DQN), policy gradients (REINFORCE), and PPO — with an interactive grid-world visualization showing Q-values converge over 200 episodes.
More diagrams coming soon
Diffusion Models · Graph Neural Networks · MLOps · LLM Fine-tuning · Causal ML
Need an AI engineer or data scientist?
I build custom ML models, AI agents, computer vision, and automation — from idea to production.