Generative Models: VAE & GAN
“Learning the shape of data — then sampling new reality from the learned distribution”
Autoencoder → Variational Autoencoder (ELBO, reparameterization trick, latent space interpolation) → GAN (generator/discriminator adversarial training, DCGAN, mode collapse solutions).
Prerequisites
Concepts Covered
∑Key Formulas
ELBO (VAE)
Reconstruction term − KL divergence (regularizes latent space)
Reparameterization
Allows gradients to flow through the sampling operation
GAN Objective
Generator fools Discriminator; Discriminator detects fakes
▶Interactive Simulation
⬡Model Architecture
From Discrimination to Generation
All previous models are discriminative: P(y|x) — given input, predict output. Generative models learn P(x) — the full distribution of the data. Once you've learned the distribution, you can sample new data points, interpolate between examples, detect anomalies (low-probability points), and do conditional generation. This is how Stable Diffusion, GPT, and DALL-E work at their core.
VAE: The Probabilistic Compression
Autoencoders compress data to a latent code then reconstruct. But the latent space is disconnected — similar images aren't near each other, so you can't sample new points meaningfully. VAEs fix this by encoding distributions (μ, σ) instead of points, and penalizing deviation from N(0,I) via KL divergence. This forces a smooth, continuous latent space where interpolation and sampling make semantic sense.
The reparameterization trick z = μ + σ⊙ε is the key insight that makes VAE training possible. Without it, sampling is a non-differentiable operation — no gradients can flow.
The ELBO: Evidence Lower Bound
We want to maximize log p(x) — the likelihood of our data under the model. This is intractable directly (requires integrating over all z). Instead, we maximize the ELBO: reconstruction quality (how well we decode) minus KL divergence from prior (how much the encoder deviates from standard Gaussian). β-VAE adds a weight β to the KL term for disentangled representations.
GAN Training: The Adversarial Game
Generator G takes noise z ~ N(0,I) and produces fake samples G(z). Discriminator D tries to distinguish real samples from fakes (output probability of being real). They play a minimax game: D maximizes log D(real) + log(1 - D(G(z))); G minimizes log(1 - D(G(z))) [equivalent to maximizing log D(G(z))]. At Nash equilibrium, G produces samples indistinguishable from real data.
Mode collapse: the generator finds a single (or few) point(s) that always fool the discriminator. Fix: Wasserstein GAN (WGAN-GP) with gradient penalty, spectral normalization, or minibatch discrimination.
DCGAN Implementation
import torch import torch.nn as nn class Generator(nn.Module): def __init__(self, latent_dim=class="tok-num">100, img_channels=class="tok-num">3): super().__init__() self.net = nn.Sequential( class="tok-comment"># Project and reshape noise nn.Linear(latent_dim, class="tok-num">512 * class="tok-num">4 * class="tok-num">4), nn.Unflatten(class="tok-num">1, (class="tok-num">512, class="tok-num">4, class="tok-num">4)), class="tok-comment"># Upsample blocks *self._block(class="tok-num">512, class="tok-num">256), *self._block(class="tok-num">256, class="tok-num">128), *self._block(class="tok-num">128, class="tok-num">64), *self._block(class="tok-num">64, class="tok-num">32), nn.ConvTranspose2d(class="tok-num">32, img_channels, class="tok-num">4, class="tok-num">2, class="tok-num">1), nn.Tanh() ) def _block(self, in_c, out_c): return [nn.ConvTranspose2d(in_c, out_c, class="tok-num">4, class="tok-num">2, class="tok-num">1, bias=False), nn.BatchNorm2d(out_c), nn.ReLU(True)] def forward(self, z): return self.net(z) class="tok-comment"># WGAN-GP training (more stable than vanilla GAN) def gradient_penalty(D, real, fake, device): alpha = torch.rand(real.size(class="tok-num">0), class="tok-num">1, class="tok-num">1, class="tok-num">1).to(device) interpolated = alpha * real + (class="tok-num">1 - alpha) * fake interpolated.requires_grad_(True) d_interp = D(interpolated) gradients = torch.autograd.grad(d_interp, interpolated, grad_outputs=torch.ones_like(d_interp), create_graph=True)[class="tok-num">0] return ((gradients.norm(class="tok-num">2, dim=class="tok-num">1) - class="tok-num">1) ** class="tok-num">2).mean()
?Knowledge Check
Progress is saved in your browser — no account needed.
Need an AI engineer or data scientist?
I build custom ML models, AI agents, computer vision, and automation — from idea to production.