Deep Learningadvanced

Generative Models: VAE & GAN

“Learning the shape of data — then sampling new reality from the learned distribution”

Autoencoder → Variational Autoencoder (ELBO, reparameterization trick, latent space interpolation) → GAN (generator/discriminator adversarial training, DCGAN, mode collapse solutions).

80 min

10 diagrams

7 Concepts Covered

Prerequisites

→Neural Networks

→CNN Architectures

→Information Theory

Concepts Covered

AutoencoderELBOReparameterizationLatent SpaceGANDCGANMode Collapse

Previous: Audio & Speech ML Next: Reinforcement Learning

∑Key Formulas

ELBO (VAE)

Reconstruction term − KL divergence (regularizes latent space)

Reparameterization

Allows gradients to flow through the sampling operation

GAN Objective

Generator fools Discriminator; Discriminator detects fakes

▶Interactive Simulation

Loading visualization…

⬡Model Architecture

Loading visualization…

🎯

From Discrimination to Generation

motivation

All previous models are discriminative: P(y|x) — given input, predict output. Generative models learn P(x) — the full distribution of the data. Once you've learned the distribution, you can sample new data points, interpolate between examples, detect anomalies (low-probability points), and do conditional generation. This is how Stable Diffusion, GPT, and DALL-E work at their core.

💡

VAE: The Probabilistic Compression

intuition

Autoencoders compress data to a latent code then reconstruct. But the latent space is disconnected — similar images aren't near each other, so you can't sample new points meaningfully. VAEs fix this by encoding distributions (μ, σ) instead of points, and penalizing deviation from N(0,I) via KL divergence. This forces a smooth, continuous latent space where interpolation and sampling make semantic sense.

The reparameterization trick z = μ + σ⊙ε is the key insight that makes VAE training possible. Without it, sampling is a non-differentiable operation — no gradients can flow.

∑

The ELBO: Evidence Lower Bound

math

We want to maximize log p(x) — the likelihood of our data under the model. This is intractable directly (requires integrating over all z). Instead, we maximize the ELBO: reconstruction quality (how well we decode) minus KL divergence from prior (how much the encoder deviates from standard Gaussian). β-VAE adds a weight β to the KL term for disentangled representations.

🔬

GAN Training: The Adversarial Game

deepdive

Generator G takes noise z ~ N(0,I) and produces fake samples G(z). Discriminator D tries to distinguish real samples from fakes (output probability of being real). They play a minimax game: D maximizes log D(real) + log(1 - D(G(z))); G minimizes log(1 - D(G(z))) [equivalent to maximizing log D(G(z))]. At Nash equilibrium, G produces samples indistinguishable from real data.

Mode collapse: the generator finds a single (or few) point(s) that always fool the discriminator. Fix: Wasserstein GAN (WGAN-GP) with gradient penalty, spectral normalization, or minibatch discrimination.

</>

DCGAN Implementation

code

python31 lines

import torch
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self, latent_dim=100, img_channels=3):
        super().__init__()
        self.net = nn.Sequential(
            # Project and reshape noise
            nn.Linear(latent_dim, 512 * 4 * 4),
            nn.Unflatten(1, (512, 4, 4)),
            # Upsample blocks
            *self._block(512, 256), *self._block(256, 128),
            *self._block(128, 64),  *self._block(64, 32),
            nn.ConvTranspose2d(32, img_channels, 4, 2, 1),
            nn.Tanh()
        )
    def _block(self, in_c, out_c):
        return [nn.ConvTranspose2d(in_c, out_c, 4, 2, 1, bias=False),
                nn.BatchNorm2d(out_c), nn.ReLU(True)]
    def forward(self, z): return self.net(z)

# WGAN-GP training (more stable than vanilla GAN)
def gradient_penalty(D, real, fake, device):
    alpha = torch.rand(real.size(0), 1, 1, 1).to(device)
    interpolated = alpha * real + (1 - alpha) * fake
    interpolated.requires_grad_(True)
    d_interp = D(interpolated)
    gradients = torch.autograd.grad(d_interp, interpolated,
                grad_outputs=torch.ones_like(d_interp),
                create_graph=True)[0]
    return ((gradients.norm(2, dim=1) - 1) ** 2).mean()

?Knowledge Check

Progress is saved in your browser — no account needed.

Audio & Speech ML

Reinforcement Learning

Need an AI engineer or data scientist?

I build custom ML models, AI agents, computer vision, and automation — from idea to production.

Get in touch View services