Latency
PerformanceSimple Definition
The time delay between sending a request to an AI API and receiving the first token of response.
Full Explanation
Latency matters enormously for user experience. For chat interfaces, time-to-first-token (TTFT) should be under 1 second. For real-time voice AI, latency must be under 300ms. For batch processing, throughput matters more than latency. Different models have very different latency profiles — Claude Haiku is much faster than Claude Opus at the cost of capability.
Related Terms
Last verified: 2026-03-30← Back to Glossary