ML Learning Hub
Regressionintermediate

SVM, SVR & KNN

Maximum margin is the answer — the wider the street, the more confident the classifier

Support Vector Machines: maximum margin hyperplane, kernel trick (RBF, polynomial), SVR for regression with ε-tube. KNN: distance metrics, k choice, curse of dimensionality.

50 min
14 diagrams
7 Concepts Covered

Prerequisites

Linear Algebra
Model Evaluation

Concepts Covered

Maximum MarginKernel TrickRBF Kernelε-tubeKNNDistance MetricsCurse of Dimensionality

Key Formulas

SVM Objective

Maximize margin 2/||w|| subject to correct classification

Dual (Kernel)

Kernel trick: replace x·x with K(x,x) for non-linear boundaries

RBF Kernel

Radial Basis Function — infinite-dimensional Gaussian feature map

Interactive Simulation

Loading visualization…
Loading visualization…
Loading visualization…

Model Architecture

Loading visualization…
Loading visualization…
Loading visualization…
🎯

The Key Idea: Maximum Margin

motivation

Consider binary classification with a separating hyperplane. Infinite hyperplanes can separate the classes — but which one generalizes best? SVMs answer: the one with maximum margin — the largest possible 'street' between the two classes. Points on the margin boundary are support vectors. Only these points determine the boundary; the rest can be removed without changing it.

💡

The Kernel Trick: Infinite Dimensions for Free

intuition

Many datasets are not linearly separable in their original space. The kernel trick implicitly maps data to a higher-dimensional space where it IS linearly separable — without ever computing the mapping explicitly. K(x,x') = φ(x)·φ(x') computes the dot product in the high-dimensional space directly. The RBF kernel maps to an infinite-dimensional Hilbert space, making SVMs incredibly powerful.

SVMs are the only algorithm that can provably work in infinite-dimensional feature spaces (RKHS). No other algorithm has this property.

Soft Margin: Handling Noise with Slack

math

For noisy data, the hard-margin SVM (requiring perfect separation) won't work. The soft-margin SVM introduces slack variables ξᵢ ≥ 0 allowing some misclassification, penalized by hyperparameter C. Large C = narrow margin, low tolerance for errors (may overfit). Small C = wide margin, high tolerance (may underfit).

Soft-margin SVM primal objective
</>

SVM in Production

code
python35 lines
from sklearn.svm import SVC, SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.datasets import make_classification

class="tok-comment"># ── Sample data ────────────────────────────────────────────────────────
X, y = make_classification(n_samples=class="tok-num">300, n_features=class="tok-num">10, random_state=class="tok-num">42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=class="tok-num">0.2, random_state=class="tok-num">42)

class="tok-comment"># CRITICAL: SVM requires feature scaling
svm_pipeline = Pipeline([
    (class="tok-str">'scaler', StandardScaler()),
    (class="tok-str">'svm', SVC(kernel=class="tok-str">'rbf', probability=True))
])

class="tok-comment"># Tune C and gamma
param_grid = {
    class="tok-str">'svm__C': [class="tok-num">0.01, class="tok-num">0.1, class="tok-num">1, class="tok-num">10, class="tok-num">100],
    class="tok-str">'svm__gamma': [class="tok-str">'scale', class="tok-str">'auto', class="tok-num">0.001, class="tok-num">0.01, class="tok-num">0.1]
}
grid_search = GridSearchCV(
    svm_pipeline, param_grid,
    cv=class="tok-num">5, scoring=class="tok-str">'roc_auc', n_jobs=-class="tok-num">1
)
grid_search.fit(X_train, y_train)
print(fclass="tok-str">"Best AUC: {grid_search.best_score_:.4f}")
print(fclass="tok-str">"Best params: {grid_search.best_params_}")

class="tok-comment"># SVR for regression
svr = Pipeline([
    (class="tok-str">'scaler', StandardScaler()),
    (class="tok-str">'svr', SVR(kernel=class="tok-str">'rbf', C=class="tok-num">100, epsilon=class="tok-num">0.1))
])
⚠️

SVM Pitfalls

pitfall
1

No scaling = garbage results. SVM is the most scaling-sensitive algorithm. StandardScaler is not optional.

2

Slow on large data: SVM is O(n²) to O(n³) in training. Use SGDClassifier (hinge loss) for n > 50K.

3

C tuning: too large C = overfitting; too small = underfitting. Always cross-validate over log-scale grid.

4

KNN curse of dimensionality: distance metrics become meaningless in high dimensions. Use PCA first, or switch to ball-tree/KD-tree with k=√n.

?Knowledge Check

Progress is saved in your browser — no account needed.

Need an AI engineer or data scientist?

I build custom ML models, AI agents, computer vision, and automation — from idea to production.