Rate Limit

Infrastructure

Simple Definition

Constraints imposed by AI API providers on how many requests or tokens a user can process per minute, hour, or day.

Full Explanation

Rate limits exist to prevent abuse, manage server capacity, and ensure fair access. They're typically measured in RPM (requests per minute), RPD (requests per day), and TPM (tokens per minute). Enterprise tiers have higher limits. Exceeding rate limits returns 429 errors. Strategies to handle limits include queuing, batching, and exponential backoff.

Related Terms

API (Application Programming Interface)

A software interface that allows developers to access AI model capabilities programmatically — the foundation of all AI-powered products.

Inference

The process of running a trained AI model to generate outputs — what happens when you use an AI tool.

Token

The basic unit of text that AI language models process — roughly equivalent to 3/4 of a word in English.

Last verified: 2026-04-13← Back to Glossary