Rate Limit
InfrastructureConstraints imposed by AI API providers on how many requests or tokens a user can process per minute, hour, or day.
Full Explanation
Rate limits exist to prevent abuse, manage server capacity, and ensure fair access. They're typically measured in RPM (requests per minute), RPD (requests per day), and TPM (tokens per minute). Enterprise tiers have higher limits. Exceeding rate limits returns 429 errors. Strategies to handle limits include queuing, batching, and exponential backoff.
Related Terms
A software interface that allows developers to access AI model capabilities programmatically — the foundation of all AI-powered products.
The process of running a trained AI model to generate outputs — what happens when you use an AI tool.
The basic unit of text that AI language models process — roughly equivalent to 3/4 of a word in English.