We Compare AI

Model Distillation

Training
Simple Definition

Training a smaller AI model to mimic a larger, more capable model — creating efficient models that approach the larger model's quality.

Full Explanation

In distillation, the small 'student' model is trained not just on data, but on the outputs of the large 'teacher' model. The student learns to replicate the teacher's behavior patterns. This produces models much more capable than their size suggests. GPT-4o mini is distilled from GPT-4o. Many small, fast models are distillations of larger frontier models.

Last verified: 2026-03-30← Back to Glossary