How to Fix OpenAI "Rate Limit Exceeded" (429) — RPM, TPM & Retries
Fix OpenAI rate limit exceeded and HTTP 429 errors. Understand RPM/TPM limits, exponential backoff, and how an LLM gateway with multi-channel fallback keeps your app running.
Published 2026-06-29
OpenAI rate limit exceeded (HTTP 429 Too Many Requests) means you hit a throughput cap — requests per minute (RPM), tokens per minute (TPM), or daily spend limits — before the model could finish your call. The error often includes Rate limit reached or rate_limit_exceeded.
Fast fixes: slow down with exponential backoff, batch or queue requests, reduce max_tokens, upgrade your OpenAI tier, or route through a gateway that automatically fails over to alternate upstream channels. APIMaster aggregates multiple routes so one vendor's 429 does not stop production traffic.
What This Error Means
After authentication succeeds, OpenAI meters how fast you consume requests and tokens. Exceed the bucket and the API returns 429:
{
"error": {
"message": "Rate limit reached for gpt-4o in organization org-xxx on requests per min (RPM): Limit 500, Used 500, Requested 1.",
"type": "tokens",
"code": "rate_limit_exceeded"
}
}
Third-party relays may surface the same string or a generic 429 wrapper. This is different from an invalid API key (401) or content blocked (400) — your key is valid, you are just too fast or too heavy for the current quota tier.
Common Causes
- Burst traffic — many parallel users or agents firing requests in the same second.
- High
max_tokens— large completions burn TPM quickly even at moderate RPM. - Retry storms — your app retries 429s immediately without backoff, making limits worse.
- Shared org key — multiple services reuse one key and share one RPM/TPM bucket.
- Free / low tier limits — new OpenAI accounts and cheap relays cap throughput aggressively.
- Model-specific caps — frontier models often have lower RPM than
gpt-4o-mini. - Streaming + tools — agent loops multiply calls per user action.
How to Fix It
1. Read the 429 response headers
OpenAI often sends x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and retry-after. Sleep until retry-after seconds elapse before retrying.
2. Implement exponential backoff with jitter
import time, random
from openai import OpenAI, RateLimitError
client = OpenAI()
for attempt in range(6):
try:
return client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "hello"}],
)
except RateLimitError:
time.sleep(min(60, (2 ** attempt) + random.random()))
Never tight-loop on 429 — you will extend the outage.
3. Reduce load
- Lower
max_tokenswhere possible. - Cache identical prompts.
- Queue requests server-side (worker + Redis) instead of unbounded client parallelism.
- Use cheaper/smaller models for classification or routing steps.
4. Raise official limits
On OpenAI: add billing, request tier increase, or split workloads across projects/orgs ** responsibly** (still subject to policy).
5. Use a gateway with automatic fallback
Production apps should not depend on a single upstream RPM bucket. An LLM gateway can:
- Route to another provider or channel when one returns 429
- Spread traffic across keys or regions where permitted
- Surface queueing at the edge so your app sees fewer hard failures
APIMaster is an OpenAI-compatible aggregator with multi-channel routing — when one path is throttled, traffic can move to available alternatives. Top up from $1, point SDKs at https://apimaster.ai/v1, and keep building without hand-tuning every vendor limit.
How APIMaster Helps
Hitting 429 too often? APIMaster helps on three fronts:
| Advantage | What you get |
|---|---|
| Discount | Marketplace pricing — up to ~90% / ~85% off official list rates; stretch the same budget further. |
| Stability | Automatic fallback when one upstream hits RPM/TPM caps — fewer single-vendor 429 outages (pair with app-level backoff). |
| Model fidelity | After failover, use the Model Tester; check keys with the Key Tester. |
https://apimaster.ai/v1 · From $1 top-up, pay-as-you-go.
Related API Errors
- Invalid API key — 401 authentication
- api error 400 content blocked — moderation 400
- Claude / Anthropic 529 overloaded — capacity, not RPM quota
- All API error fix guides — full index
FAQ
What is OpenAI rate limit exceeded? HTTP 429 indicating you exceeded RPM, TPM, or related quotas for your organization and model. Wait and retry with backoff, or route through a gateway with fallback.
429 vs 529 — what's the difference? 429 is usually your quota / rate (OpenAI RPM/TPM). 529 on Anthropic is server overload — the service is temporarily at capacity. Fix patterns differ; see our 529 guide.
Will upgrading OpenAI tier fix all 429s? It raises caps but burst agent traffic can still hit limits. Gateways plus queueing are the durable fix for production.
Does APIMaster remove rate limits entirely? No platform offers unlimited frontier-model throughput. APIMaster improves availability by routing across channels when one upstream throttles — you still should implement backoff in your app.