SpecStep applies two independent rate-limit policies to API traffic and a separate auth-failure throttle that operates before a request is authenticated. All three are documented here.
Both per-minute policies apply to every authenticated request regardless of auth scheme — sf_… API-key bearers and oat_… OAuth tokens hit the same counters. The two schemes differ only in how the counter is scoped.
Standard limit
Most endpoints allow 60 requests per minute per actor. The window is a rolling 60-second counter. The scope key is per-API-key for API-key callers and per-user for OAuth callers — each API key gets its own independent counter, while every OAuth-authenticated MCP client on the same user account shares a single counter. Claude Desktop and Cursor connected to the same SpecStep account therefore draw from one combined 60-req/min budget; a second API key on the same account gets its own fresh 60-req/min budget.
Generation kickoff limit
POST /v1/generations and POST /v1/generations/{id}/update both count against a tighter limit of 5 kickoffs per minute per actor. The kickoff counter is separate from the standard counter — a kickoff request does not consume capacity from your 60-req/min budget. The same scoping rule applies: API keys get a per-key kickoff counter; OAuth tokens share one kickoff counter per user across all connected clients.
Auth-failure throttle
This is distinct from the two limits above. See Authentication — auth-failure throttle for the full description. In brief: 5 failed auth attempts in 5 minutes per client IP causes subsequent attempts from that IP to be rejected without a database lookup. Only failed authentications increment the counter. Successful requests never trigger it.
Response headers
Every response from a rate-limited endpoint includes these headers:
| Header | Value |
|---|---|
RateLimit-Limit |
The cap that applies to this request (60 for standard, 5 for kickoffs) |
RateLimit-Remaining |
Requests remaining in the current window |
RateLimit-Reset |
Seconds until the window resets and the counter clears |
When a request is rejected for exceeding the limit, the response also includes:
| Header | Value |
|---|---|
Retry-After |
Seconds to wait before retrying |
429 response body
A rejected request receives HTTP 429 with a Content-Type: application/problem+json body:
{
"type": "RATE_LIMITED",
"title": "Too many requests",
"status": 429,
"detail": "Rate limit exceeded; retry in 14s.",
"retry_after_seconds": 14
}
Retry semantics
Read Retry-After from the response header and wait that many seconds before retrying. Do not retry immediately — back-to-back retries in the same window will each return 429 and do not advance the counter.
A simple retry loop:
import time, httpx
def call_with_retry(client, method, url, **kwargs):
for _ in range(3):
response = client.request(method, url, **kwargs)
if response.status_code != 429:
return response
retry_after = int(response.headers.get("Retry-After", "10"))
time.sleep(retry_after + 1)
return response
A handful of paths are excluded from rate limiting: /v1/openapi.json, /v1/schema/*, the health probes, and static / framework assets. Everything else under /v1/* is subject to the windows above.