Latency

Performance

Simple Definition

The time delay between sending a request to an AI API and receiving the first token of response.

Full Explanation

Latency matters enormously for user experience. For chat interfaces, time-to-first-token (TTFT) should be under 1 second. For real-time voice AI, latency must be under 300ms. For batch processing, throughput matters more than latency. Different models have very different latency profiles — Claude Haiku is much faster than Claude Opus at the cost of capability.

Related Terms

Inference

The process of running a trained AI model to generate outputs — what happens when you use an AI tool.

Token

The basic unit of text that AI language models process — roughly equivalent to 3/4 of a word in English.

Last verified: 2026-04-13← Back to Glossary