Back to Blog
Computer Vision October 1, 2024 8 min read

Data Augmentation Strategies When You Have < 1000 Samples

Mixup, CutMix, AugMix, synthetic data with GANs, and test-time augmentation — what to use when your dataset is tiny and performance is critical.

Augmentation Hierarchy (Most to Least Impactful)

Tier 1: Always Do This

  • Random horizontal/vertical flip
  • Random rotation (±15°)
  • Random crop and resize
  • Color jitter (brightness, contrast, saturation)

Tier 2: Usually Helps

  • Mixup: blend two images and their labels
lam = np.random.beta(0.2, 0.2)
x_mix = lam * x1 + (1-lam) * x2
y_mix = lam * y1 + (1-lam) * y2
  • CutMix: paste a patch from one image to another

Tier 3: For Very Small Datasets (<200 samples)

  • Elastic transformations (for medical images)
  • Grid distortion
  • Test-time augmentation (TTA) — ensemble 8 augmented versions at inference

Tier 4: Synthetic Data

  • Train a GAN or use Stable Diffusion to generate additional training samples
  • Works well for domain-specific rare classes
Data AugmentationSmall DatasetsMixupCutMixComputer Vision
O

Ossama Elhakki

AI Engineer & ML Systems Builder — Morocco