Nobody → ML Engineer · 33 topics

ML Learning Hub

Visual explanations of machine learning concepts — hand-crafted diagram series

🛠️Phase 0
35 min·1 topic

Setup & Tooling

Your Python ML environment

0/1
0.1
📐
6 diagrams · 35 min
Foundationsbeginner35 min

Python ML Stack: NumPy, Pandas & Matplotlib

Master the tools every ML engineer uses daily — NumPy vectorized operations, Pandas DataFrames for real-world data, and Matplotlib/Seaborn for exploratory visualization. The foundation everything else builds on.

NumPy ArraysBroadcastingPandas DataFrameEDA+3
No prereqs
Deep Dive
1
📐Phase 1
2h 55min·4 topics

Math Foundations

Linear algebra, calculus, probability & information theory

0/4
1.1
📐
10 diagrams · 45 min
Foundationsbeginner45 min

Linear Algebra for ML

Vectors, dot products, matrix multiplication, eigendecomposition and SVD — with visual intuition for how matrices transform space. The language every neural network is written in.

Dot ProductMatrix MultiplyEigenvaluesEigenvectors+4
1 prereq
Deep Dive
1.2
📐
8 diagrams · 45 min
Foundationsbeginner45 min

Calculus & Optimization

Derivatives, partial derivatives, the chain rule (= backpropagation), and gradient descent. Then Adam, momentum, learning rate scheduling — the full story of how neural networks actually learn.

DerivativesChain RuleGradientGradient Descent+4
1 prereq
Deep Dive
1.3
📐
9 diagrams · 50 min
Foundationsbeginner50 min

Probability & Statistics

Probability distributions (Normal, Binomial, Poisson), MLE, Bayes' theorem, hypothesis testing, and the Central Limit Theorem — the language of uncertainty that underlies every loss function and evaluation metric.

Normal DistributionMLEBayes Theoremp-values+3
1 prereq
Deep Dive
1.4
📐
7 diagrams · 35 min
Foundationsintermediate35 min

Information Theory

Entropy, cross-entropy loss, KL divergence, and mutual information — the mathematical backbone behind why cross-entropy works as a loss function, how VAEs work, and why transformers use attention.

EntropyCross-EntropyKL DivergenceMutual Information+3
1 prereq
Deep Dive
1
2
3
4
📈Phase 2
11h 30min·17 topics

Classic ML

Supervised, unsupervised, ensemble — the full sklearn toolkit

0/17
2.1
📈
15 diagrams · 45 min
Regressionbeginner45 min

Linear & Logistic Regression

Visual deep-dive from OLS to gradient descent, R², residuals, multicollinearity, then logistic: sigmoid, log loss, L1/L2 regularization, and decision boundaries.

Least SquaresGradient DescentSigmoid+4
2 prereqs
Deep Dive
2.2
📊
8 diagrams · 35 min
Evaluationbeginner35 min

Model Evaluation & Metrics

Complete guide: accuracy, precision, recall, F1, ROC-AUC, confusion matrix, PR curves, cross-validation (StratifiedKFold, TimeSeriesSplit), and choosing the right metric for your task.

ROC-AUCF1 ScoreConfusion MatrixCross-validation+2
1 prereq
Deep Dive
2.3
📊
5 diagrams · 25 min
Evaluationbeginner25 min

Bias-Variance Tradeoff & Error Analysis

Visual intuition for underfitting vs overfitting, bias-variance decomposition, learning curves, and systematic error analysis — how to diagnose what's wrong with your model.

BiasVarianceUnderfittingOverfitting+2
1 prereq
Deep Dive
2.4
⚙️
14 diagrams · 45 min
Applied MLbeginner45 min

Feature Engineering & Pipelines

The full preprocessing pipeline: imputation (Simple, MICE), categorical encoding (OHE, Target, Ordinal), scaling (Standard, MinMax, Robust), feature creation (polynomial, interactions, log transforms), and sklearn Pipelines for leakage-free evaluation.

ImputationOneHotEncoderStandardScalerRobustScaler+3
2 prereqs
Deep Dive
2.5
⚙️
7 diagrams · 30 min
Applied MLbeginner30 min

Naïve Bayes Classifiers

Bayes' theorem, the conditional independence assumption, Gaussian/Multinomial/Complement/Bernoulli variants, Laplace smoothing, text classification with TF-IDF, and probability calibration — with interactive posterior probability demo.

Bayes TheoremMAP DecisionLaplace SmoothingGaussianNB+3
1 prereq
Deep Dive
2.6
📈
14 diagrams · 40 min
Regressionbeginner40 min

Decision Trees & Random Forest

How decision trees split data (Gini, entropy, information gain), pruning, then Random Forest as bagged ensemble — variance reduction, feature importance, OOB evaluation.

Gini ImpurityEntropyInformation GainPruning+3
2 prereqs
Deep Dive
2.7
📈
14 diagrams · 50 min
Regressionintermediate50 min

SVM, SVR & KNN

Support Vector Machines: maximum margin hyperplane, kernel trick (RBF, polynomial), SVR for regression with ε-tube. KNN: distance metrics, k choice, curse of dimensionality.

Maximum MarginKernel TrickRBF Kernelε-tube+3
2 prereqs
Deep Dive
2.8
🔮
12 diagrams · 40 min
Unsupervisedintermediate40 min

Clustering: K-Means & DBSCAN

Unsupervised grouping of unlabeled data — K-Means (Lloyd's algorithm, inertia, K selection via elbow/silhouette), DBSCAN (core/border/noise points, ε-neighborhood, arbitrary shapes), and hierarchical clustering.

K-MeansDBSCANSilhouette ScoreInertia+3
2 prereqs
Deep Dive
2.9
🔮
10 diagrams · 45 min
Unsupervisedintermediate45 min

PCA & Dimensionality Reduction

Principal Component Analysis from scratch: eigendecomposition of the covariance matrix, variance explained, choosing k components, whitening, t-SNE/UMAP contrast, and practical applications in visualization and preprocessing.

EigendecompositionVariance ExplainedCovariance MatrixWhitening+3
2 prereqs
Deep Dive
2.10
🔮
8 diagrams · 35 min
Unsupervisedintermediate35 min

Anomaly & Outlier Detection

Statistical (Z-Score, IQR fences) and algorithmic (Isolation Forest, LOF, One-Class SVM) approaches to finding rare abnormal observations — fraud detection, manufacturing defects, network intrusion.

Z-ScoreIQRIsolation ForestLOF+3
2 prereqs
Deep Dive
2.11
🌲
18 diagrams · 60 min
Ensemblesintermediate60 min

Gradient Boosting: XGBoost, LightGBM, CatBoost

From vanilla Gradient Boosting to XGBoost (tree scores), then LightGBM (histogram-based, leaf-wise growth), and CatBoost (ordered boosting for categoricals). Optuna HPO patterns.

ResidualsTree Score (SSR+λT)Histogram BinningLeaf-wise Growth+2
2 prereqs
Deep Dive
2.12
🌲
11 diagrams · 45 min
Ensemblesintermediate45 min

Bagging, Boosting & Stacking

Visual explanation of all ensemble paradigms — how bagging reduces variance (Random Forest), boosting reduces bias (AdaBoost.R2, SAMME), and stacking combines predictions via meta-learner.

Variance ReductionBias ReductionAdaBoostSAMME+2
2 prereqs
Deep Dive
2.13
🏷️
6 diagrams · 30 min
Classificationintermediate30 min

OvA vs OvO Multi-class Classification

One-vs-All and One-vs-One strategies for extending binary classifiers to multi-class — decision boundaries, scalability, SVM applications, and when to use Softmax instead.

Multi-classDecision BoundariesOvAOvO+2
2 prereqs
Deep Dive
2.14
⚙️
9 diagrams · 40 min
Applied MLintermediate40 min

Hyperparameter Tuning

Grid Search (exhaustive), Random Search (surprisingly effective), Bayesian Optimisation (TPE/GP-based sequential search), Successive Halving, and Optuna — with interactive accuracy heatmap showing C × max_depth search space.

GridSearchCVRandomizedSearchCVBayesian OptimisationOptuna+3
2 prereqs
Deep Dive
2.15
⚙️
6 diagrams · 35 min
Applied MLintermediate35 min

Feature Importance & Selection

Permutation importance vs impurity (Gini) importance, SHAP unified attribution, drop-column importance, and how correlated features split scores unfairly — with interactive bar chart toggling between methods.

Permutation ImportanceGini ImportanceSHAPDrop-Column+2
2 prereqs
Deep Dive
2.16
⚙️
8 diagrams · 40 min
Applied MLadvanced40 min

Partial Dependence & ICE Plots

PDPs marginalize over all other features to show the average effect of one variable. ICE curves expose per-sample heterogeneity. Centered ICE removes intercept bias. ALE plots fix PDPs extrapolation problem for correlated features.

PDPICE Curvesc-ICEALE Plots+3
1 prereq
Deep Dive
2.17
⚙️
11 diagrams · 50 min
Applied MLintermediate50 min

Time Series Forecasting

Trend-seasonal-residual decomposition, lag features, rolling statistics, Fourier seasonality, TimeSeriesSplit cross-validation, ARIMA intuition, and gradient boosting for tabular forecasting — with animated decomposition and 3-step forecast.

DecompositionLag FeaturesRolling StatisticsTimeSeriesSplit+3
2 prereqs
Deep Dive
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🧠Phase 3
3h 45min·4 topics

Deep Learning Core

Neural networks, CNNs, RNNs and training techniques

0/4
3.1
🧠
10 diagrams · 60 min
Deep Learningintermediate60 min

Neural Networks — Forward & Backpropagation

From single perceptron to multi-layer networks: forward pass, activation functions (ReLU/sigmoid/tanh), backpropagation derivation, vanishing gradients, and weight initialization strategies.

PerceptronBackpropagationReLUVanishing Gradient+2
3 prereqs
Deep Dive
3.2
🧠
10 diagrams · 40 min
Deep Learningintermediate40 min

Deep Learning Optimization

SGD vs Momentum vs Adam vs AdamW, learning rate warmup and cosine scheduling, batch normalization, dropout, gradient clipping, and mixed-precision training — the recipe that makes modern deep networks train stably.

AdamAdamWMomentumBatch Normalization+4
1 prereq
Deep Dive
3.3
🧠
14 diagrams · 55 min
Deep Learningintermediate55 min

CNN Architectures: Classic → ResNet → ViT

Convolutional networks from scratch: convolution op, pooling, receptive field, then classic (LeNet/VGG), Inception, ResNet (skip connections), and Vision Transformer (ViT, patch embeddings).

ConvolutionPoolingSkip ConnectionsInception+3
2 prereqs
Deep Dive
3.4
🧠
8 diagrams · 70 min
Deep Learningadvanced70 min

RNN, LSTM & GRU — Sequence Modeling

Recurrent networks for sequences: vanilla RNN (BPTT, exploding/vanishing gradients), LSTM (forget/input/output gates, cell state), GRU (simplified gating), Bi-LSTM for bidirectional context.

BPTTVanishing GradientLSTM GatesCell State+3
2 prereqs
Deep Dive
1
2
3
4
👁️Phase 4
1h 25min·2 topics

Computer Vision

Detection, segmentation and visual understanding

0/2
4.1
👁️
11 diagrams · 45 min
Computer Visionintermediate45 min

Object Detection: YOLO & Faster-RCNN

From sliding windows to single-shot detectors — IoU, anchor boxes, NMS, mAP, and the two-stage vs one-stage architecture trade-off. How YOLO detects 80 object categories in real-time at 30 FPS.

IoUAnchor BoxesNMSmAP+4
1 prereq
Deep Dive
4.2
👁️
9 diagrams · 40 min
Computer Visionadvanced40 min

Image Segmentation: UNet & DeepLab

Pixel-level classification — semantic vs instance vs panoptic segmentation, skip connections in UNet, dilated convolutions in DeepLab, Dice loss for class imbalance, and applications in medical imaging and autonomous driving.

Semantic SegmentationInstance SegmentationUNetSkip Connections+3
1 prereq
Deep Dive
1
2
💬Phase 5
2h 15min·2 topics

NLP & Transformers

Text, attention mechanisms and large language models

0/2
5.1
⚙️
10 diagrams · 45 min
Applied MLintermediate45 min

NLP: Text Classification Pipeline

The full classical NLP pipeline: tokenization, TF-IDF vectorization, Naïve Bayes/Logistic/SVM classification, evaluation (macro-F1), word embeddings vs TF-IDF, and sentence-transformers for semantic search.

TokenizationTF-IDFBag of WordsN-grams+3
2 prereqs
Deep Dive
5.2
🧠
9 diagrams · 90 min
Deep Learningadvanced90 min

Transformers & Self-Attention

Deep dive into attention mechanisms: scaled dot-product, multi-head attention, positional encoding, feed-forward sublayer, temperature/top-k/top-p sampling, BERT encoder vs GPT decoder.

Scaled Dot-ProductMulti-head AttentionPositional EncodingFFN+3
2 prereqs
Deep Dive
1
2
🎵Phase 6
40 min·1 topic

Audio & Speech

Spectrograms, ASR and audio classification

0/1
6.1
🎵
8 diagrams · 40 min
Audio & Speechintermediate40 min

Audio & Speech ML

From raw waveforms to MFCC features — STFT spectrograms, Mel filterbanks, audio classification CNNs, CTC loss for speech recognition, SpecAugment, and OpenAI Whisper for production-grade ASR.

STFTMel SpectrogramMFCCAudio CNN+4
2 prereqs
Deep Dive
1
🎨Phase 7
1h 20min·1 topic

Generative AI

VAEs, GANs and diffusion-style generation

0/1
7.1
🧠
10 diagrams · 80 min
Deep Learningadvanced80 min

Generative Models: VAE & GAN

Autoencoder → Variational Autoencoder (ELBO, reparameterization trick, latent space interpolation) → GAN (generator/discriminator adversarial training, DCGAN, mode collapse solutions).

AutoencoderELBOReparameterizationLatent Space+3
3 prereqs
Deep Dive
1
🎮Phase 8
55 min·1 topic

Reinforcement Learning

MDPs, Q-learning, policy gradients and PPO

0/1
8.1
🎮
12 diagrams · 55 min
Reinforcement RLadvanced55 min

Reinforcement Learning

MDP formalism, Bellman equations, Q-learning, Deep Q-Networks (DQN), policy gradients (REINFORCE), and PPO — with an interactive grid-world visualization showing Q-values converge over 200 episodes.

MDPBellman EquationQ-LearningDQN+4
3 prereqs
Deep Dive
1
🚀

More diagrams coming soon

Diffusion Models · Graph Neural Networks · MLOps · LLM Fine-tuning · Causal ML

Need an AI engineer or data scientist?

I build custom ML models, AI agents, computer vision, and automation — from idea to production.