Rate limits · API docs

SpecStep applies two independent rate-limit policies to API traffic and a separate auth-failure throttle that operates before a request is authenticated. All three are documented here.

Both per-minute policies apply to every authenticated request regardless of auth scheme — sf_… API-key bearers and oat_… OAuth tokens hit the same counters. The two schemes differ only in how the counter is scoped.

Standard limit

Most endpoints allow 60 requests per minute per actor. The window is a rolling 60-second counter. The scope key is per-API-key for API-key callers and per-user for OAuth callers — each API key gets its own independent counter, while every OAuth-authenticated MCP client on the same user account shares a single counter. Claude Desktop and Cursor connected to the same SpecStep account therefore draw from one combined 60-req/min budget; a second API key on the same account gets its own fresh 60-req/min budget.

Generation kickoff limit

POST /v1/generations and POST /v1/generations/{id}/update both count against a tighter limit of 5 kickoffs per minute per actor. The kickoff counter is separate from the standard counter — a kickoff request does not consume capacity from your 60-req/min budget. The same scoping rule applies: API keys get a per-key kickoff counter; OAuth tokens share one kickoff counter per user across all connected clients.

Auth-failure throttle

This is distinct from the two limits above. See Authentication — auth-failure throttle for the full description. In brief: 5 failed auth attempts in 5 minutes per client IP causes subsequent attempts from that IP to be rejected without a database lookup. Only failed authentications increment the counter. Successful requests never trigger it.

Response headers

Every response from a rate-limited endpoint includes these headers:

Header	Value
`RateLimit-Limit`	The cap that applies to this request (60 for standard, 5 for kickoffs)
`RateLimit-Remaining`	Requests remaining in the current window
`RateLimit-Reset`	Seconds until the window resets and the counter clears

When a request is rejected for exceeding the limit, the response also includes:

Header	Value
`Retry-After`	Seconds to wait before retrying

429 response body

A rejected request receives HTTP 429 with a Content-Type: application/problem+json body:

{
  "type": "RATE_LIMITED",
  "title": "Too many requests",
  "status": 429,
  "detail": "Rate limit exceeded; retry in 14s.",
  "retry_after_seconds": 14
}

Retry semantics

Read Retry-After from the response header and wait that many seconds before retrying. Do not retry immediately — back-to-back retries in the same window will each return 429 and do not advance the counter.

A simple retry loop:

import time, httpx

def call_with_retry(client, method, url, **kwargs):
    for _ in range(3):
        response = client.request(method, url, **kwargs)
        if response.status_code != 429:
            return response
        retry_after = int(response.headers.get("Retry-After", "10"))
        time.sleep(retry_after + 1)
    return response

A handful of paths are excluded from rate limiting: /v1/openapi.json, /v1/schema/*, the health probes, and static / framework assets. Everything else under /v1/* is subject to the windows above.