Comparaison des techniques
| Méthode | Réduction de taille | Perte de précision | Effort |
|---|
| Quantification INT8 | 4x | ~1% | Faible |
| FP16 | 2x | <0.1% | Très faible |
| Élagage (30%) | 1.4x | ~2% | Moyen |
| Distillation | 5-10x | 3-5% | Élevé |
Quantification post-entraînement (la plus simple)
import torch
# Dynamic quantization (CPU inference)
model_int8 = torch.quantization.quantize_dynamic(
model,
{nn.Linear, nn.LSTM},
dtype=torch.qint8
)
# Result: 2-4x smaller, 2x faster on CPU
Distillation de connaissances
# Student learns from teacher's soft probabilities
teacher_logits = teacher(x).detach()
student_logits = student(x)
kd_loss = nn.KLDivLoss()(
F.log_softmax(student_logits/T, dim=-1),
F.softmax(teacher_logits/T, dim=-1)
) * T**2