Evaluationbeginner

Bias-Variance Tradeoff & Error Analysis

“Every model error is either a wrong assumption or sensitivity to noise — diagnosing which changes everything”

Visual intuition for underfitting vs overfitting, bias-variance decomposition, learning curves, and systematic error analysis — how to diagnose what's wrong with your model.

25 min

5 diagrams

6 Concepts Covered

Prerequisites

→Model Evaluation

Concepts Covered

BiasVarianceUnderfittingOverfittingLearning CurvesModel Complexity

Previous: Model Evaluation & Metrics Next: Feature Engineering & Pipelines

∑Key Formulas

Error Decomposition

Total expected error cannot go below irreducible noise

Bias

How far the average prediction is from the truth

Variance

How much predictions fluctuate across different training sets

▶Interactive Simulation

Loading visualization…

⬡Model Architecture

Loading visualization…

🎯

Diagnosing What's Wrong with Your Model

motivation

When a model performs poorly, there are only two fundamental causes: it's making wrong structural assumptions (bias/underfitting) or it's too sensitive to the specific training data (variance/overfitting). These have opposite fixes — more data helps variance but not bias; more capacity helps bias but not variance. Diagnosing which problem you have before applying fixes is the most important skill in applied ML.

💡

The Dartboard Analogy

intuition

Imagine throwing 100 darts at a target (true function). High bias = darts cluster far from the bullseye (wrong model). High variance = darts scatter widely (inconsistent predictions). Low bias + low variance = darts cluster tight on the bullseye. A polynomial of degree 1 has high bias (can't fit non-linear data). A polynomial of degree 20 has high variance (fits training noise). Degree 3-5 might be the sweet spot.

Irreducible error (σ²) is the noise inherent in the data — measurement error, unobserved variables. No model can beat it. Knowing σ² sets the performance ceiling.

∑

The Mathematical Decomposition

math

Expected test error for any estimator decomposes into three additive terms. The irreducible error σ² is a property of the data generating process, not the model. The tradeoff: as model complexity increases, bias decreases but variance increases. The optimal complexity minimizes their sum. Regularization (L1/L2) explicitly adds a bias term to reduce variance.

🔬

Learning Curves: Reading the Diagnosis

deepdive

Plot training error and validation error vs. training set size. High bias signature: both curves plateau at high error — adding more data won't help; use a more complex model. High variance signature: large gap between train error (low) and val error (high) — adding more data will help (curves converge); also try dropout/regularization. Both curves nearly touching at acceptable error = good generalization.

⚙️

Systematic Error Analysis Protocol

algorithm

Establish baseline: naive model (majority class, mean prediction) sets the floor

Human-level performance: upper bound on achievable accuracy (irreducible noise floor)

Avoidable bias = train_error - human_error: fix with larger model, more features

Variance = val_error - train_error: fix with more data, dropout, regularization

Data mismatch = val_distribution ≠ train_distribution: fix with domain adaptation

Error analysis: hand-inspect 100 val errors, tag by category → fix the biggest category

</>

Learning Curve Diagnostic

code

python35 lines

from sklearn.model_selection import learning_curve, train_test_split
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
import matplotlib.pyplot as plt
import numpy as np

# ── Sample data + model ────────────────────────────────────────────────
X, y = make_classification(n_samples=800, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)
model = GradientBoostingClassifier(n_estimators=100, random_state=42)

train_sizes, train_scores, val_scores = learning_curve(
    estimator=model,
    X=X_train, y=y_train,
    train_sizes=np.linspace(0.1, 1.0, 10),
    cv=5, scoring='roc_auc',
    n_jobs=-1, shuffle=True
)

train_mean = train_scores.mean(axis=1)
train_std = train_scores.std(axis=1)
val_mean = val_scores.mean(axis=1)
val_std = val_scores.std(axis=1)

# Diagnosis:
gap = train_mean[-1] - val_mean[-1]
level = val_mean[-1]

if level < 0.7:
    print("HIGH BIAS: increase model complexity or add features")
elif gap > 0.1:
    print("HIGH VARIANCE: add more data, regularization, or reduce complexity")
else:
    print("Good generalization — optimize hyperparameters")

?Knowledge Check

Progress is saved in your browser — no account needed.

Model Evaluation & Metrics

Feature Engineering & Pipelines

Need an AI engineer or data scientist?

I build custom ML models, AI agents, computer vision, and automation — from idea to production.

Get in touch View services