Model Distillation

Training

Simple Definition

Training a smaller AI model to mimic a larger, more capable model — creating efficient models that approach the larger model's quality.

Full Explanation

In distillation, the small 'student' model is trained not just on data, but on the outputs of the large 'teacher' model. The student learns to replicate the teacher's behavior patterns. This produces models much more capable than their size suggests. GPT-4o mini is distilled from GPT-4o. Many small, fast models are distillations of larger frontier models.

Related Terms

Fine-tuning

Further training a pre-trained AI model on a smaller, task-specific dataset to specialize its behavior.

Large Language Model (LLM)

A type of AI model trained on vast amounts of text data that can generate, summarize, translate, and reason about language.

Inference

The process of running a trained AI model to generate outputs — what happens when you use an AI tool.

Last verified: 2026-04-13← Back to Glossary