Model Distillation
TrainingTraining a smaller AI model to mimic a larger, more capable model — creating efficient models that approach the larger model's quality.
Full Explanation
In distillation, the small 'student' model is trained not just on data, but on the outputs of the large 'teacher' model. The student learns to replicate the teacher's behavior patterns. This produces models much more capable than their size suggests. GPT-4o mini is distilled from GPT-4o. Many small, fast models are distillations of larger frontier models.
Related Terms
Further training a pre-trained AI model on a smaller, task-specific dataset to specialize its behavior.
A type of AI model trained on vast amounts of text data that can generate, summarize, translate, and reason about language.
The process of running a trained AI model to generate outputs — what happens when you use an AI tool.