APIMaster Blog

How to Fix OpenAI "Rate Limit Exceeded" (429) — RPM, TPM & Retries

Fix OpenAI rate limit exceeded and HTTP 429 errors. Understand RPM/TPM limits, exponential backoff, and how an LLM gateway with multi-channel fallback keeps your app running.

API errorrate limitOpenAI API429 errorLLM gateway

Published 2026-06-29

Quick Answer

OpenAI rate limit exceeded (HTTP 429 Too Many Requests) means you hit a throughput cap — requests per minute (RPM), tokens per minute (TPM), or daily spend limits — before the model could finish your call. The error often includes Rate limit reached or rate_limit_exceeded.

Fast fixes: slow down with exponential backoff, batch or queue requests, reduce max_tokens, upgrade your OpenAI tier, or route through a gateway that automatically fails over to alternate upstream channels. APIMaster aggregates multiple routes so one vendor's 429 does not stop production traffic.

What This Error Means

After authentication succeeds, OpenAI meters how fast you consume requests and tokens. Exceed the bucket and the API returns 429:

{
  "error": {
    "message": "Rate limit reached for gpt-4o in organization org-xxx on requests per min (RPM): Limit 500, Used 500, Requested 1.",
    "type": "tokens",
    "code": "rate_limit_exceeded"
  }
}

Third-party relays may surface the same string or a generic 429 wrapper. This is different from an invalid API key (401) or content blocked (400) — your key is valid, you are just too fast or too heavy for the current quota tier.

Common Causes

Burst traffic — many parallel users or agents firing requests in the same second.
High max_tokens — large completions burn TPM quickly even at moderate RPM.
Retry storms — your app retries 429s immediately without backoff, making limits worse.
Shared org key — multiple services reuse one key and share one RPM/TPM bucket.
Free / low tier limits — new OpenAI accounts and cheap relays cap throughput aggressively.
Model-specific caps — frontier models often have lower RPM than gpt-4o-mini.
Streaming + tools — agent loops multiply calls per user action.

How to Fix It

1. Read the 429 response headers

OpenAI often sends x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and retry-after. Sleep until retry-after seconds elapse before retrying.

2. Implement exponential backoff with jitter

import time, random
from openai import OpenAI, RateLimitError

client = OpenAI()
for attempt in range(6):
    try:
        return client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "hello"}],
        )
    except RateLimitError:
        time.sleep(min(60, (2 ** attempt) + random.random()))

Never tight-loop on 429 — you will extend the outage.

3. Reduce load

Lower max_tokens where possible.
Cache identical prompts.
Queue requests server-side (worker + Redis) instead of unbounded client parallelism.
Use cheaper/smaller models for classification or routing steps.

4. Raise official limits

On OpenAI: add billing, request tier increase, or split workloads across projects/orgs ** responsibly** (still subject to policy).

5. Use a gateway with automatic fallback

Production apps should not depend on a single upstream RPM bucket. An LLM gateway can:

Route to another provider or channel when one returns 429
Spread traffic across keys or regions where permitted
Surface queueing at the edge so your app sees fewer hard failures

APIMaster is an OpenAI-compatible aggregator with multi-channel routing — when one path is throttled, traffic can move to available alternatives. Top up from $1, point SDKs at https://apimaster.ai/v1, and keep building without hand-tuning every vendor limit.

Get started on APIMaster →

How APIMaster Helps

Hitting 429 too often? APIMaster helps on three fronts:

Advantage	What you get
Discount	Marketplace pricing — up to ~90% / ~85% off official list rates; stretch the same budget further.
Stability	Automatic fallback when one upstream hits RPM/TPM caps — fewer single-vendor 429 outages (pair with app-level backoff).
Model fidelity	After failover, use the Model Tester; check keys with the Key Tester.

https://apimaster.ai/v1 · From $1 top-up, pay-as-you-go.

Invalid API key — 401 authentication
api error 400 content blocked — moderation 400
Claude / Anthropic 529 overloaded — capacity, not RPM quota
All API error fix guides — full index

FAQ

What is OpenAI rate limit exceeded? HTTP 429 indicating you exceeded RPM, TPM, or related quotas for your organization and model. Wait and retry with backoff, or route through a gateway with fallback.

429 vs 529 — what's the difference? 429 is usually your quota / rate (OpenAI RPM/TPM). 529 on Anthropic is server overload — the service is temporarily at capacity. Fix patterns differ; see our 529 guide.

Will upgrading OpenAI tier fix all 429s? It raises caps but burst agent traffic can still hit limits. Gateways plus queueing are the durable fix for production.

Does APIMaster remove rate limits entirely? No platform offers unlimited frontier-model throughput. APIMaster improves availability by routing across channels when one upstream throttles — you still should implement backoff in your app.