Skip to content
Loading SpecStep…
On this page

Errors

Updated 2026-05-27

Application-layer errors (validation, conflicts, rate limits, generation quota, upstream provider failures, unexpected server errors) are returned in RFC 7807 problem-details format with Content-Type: application/problem+json and a JSON body with a consistent shape.

Authentication challenges are an exception. A 401 from the API-key auth handler — missing key, unknown key, revoked key, disabled owner, or a per-IP throttle hit — is a bare WWW-Authenticate: Bearer challenge with no JSON body. The status alone is the signal; the body is intentionally empty so failure modes do not disclose which condition applied.

Problem-details shape

{
  "type": "VALIDATION_ERROR",
  "title": "One or more validation errors occurred.",
  "status": 400,
  "detail": "The 'name' field is required."
}
Field Description
type A machine-readable string identifying the error class. Use this for programmatic branching.
title A short human-readable summary. Stable per error class.
status Mirrors the HTTP status code.
detail A specific description of what went wrong for this request. May include field names, constraint violations, or contextual information.

Status codes

400 — bad request

The request body or query parameters failed validation. Read detail to see which field or constraint caused the rejection. Fix the request before retrying.

The billing-checkout endpoint — POST /v1/billing/checkout-session — returns one specific typed 400 worth branching on:

type When it fires What to do
FEATURE_NOT_ALLOWED The caller requested a checkout session for a tier whose feature set has not launched yet. Today this fires for tier: "Team" — Team is on the public pricing surface as "Coming soon", and the server refuses Team-targeted checkouts so a DevTools bypass of the client-side overlay can't reach Stripe and pay for features that don't exist. The block lifts when Team launches. Don't retry against the same tier — pick tier: "Pro" (or use the existing subscription portal at POST /v1/billing/portal-session for an existing customer).

401 — unauthorized

No valid API key was found, or the key was revoked, or the account owning the key is disabled. The response is a bare challenge with no JSON body — the status alone is the signal. Check that your Authorization: Bearer sf_xxxxxxxxxxxx header is present, correctly spelled, and uses a key that has not been revoked.

A 401 can also mean the per-IP auth-failure throttle has engaged. See rate limits and authentication.

402 — payment required

Your account has either exhausted its monthly generation allowance for the current period or attempted a generation profile (e.g. Researcher) that your tier does not include. The response body identifies the limit you hit, and the response carries X-Quota-Tier, X-Quota-Limit, X-Quota-Used, and X-Quota-Reset headers so callers can render upgrade prompts. Wait until X-Quota-Reset, switch to a permitted profile, or upgrade the account.

Generation kickoff endpoints (POST /v1/generations, POST /v1/generations/{id}/retry) emit two specific typed 402 values worth branching on:

type When it fires What to do
QUOTA_EXCEEDED The caller's monthly generation quota for the current billing period is exhausted, and either Extra Usage is disabled OR its balance is below the p75-cost forecast for the requested run. Wait until X-Quota-Reset, top up Extra Usage, or upgrade the tier. The MCP companion tool validate_generation_request returns the same code without enqueueing a generation — use it to confirm a kickoff will succeed before paying for it.
PROFILE_NOT_ALLOWED The requested review_profile (e.g., Extensive) is not available on the caller's tier. Pick a profile your tier allows (get_capabilities lists which profiles are available), or upgrade. validate_generation_request returns the same code on a dry-run.

403 — forbidden

The API key is valid, but it does not have permission to perform the requested action. This typically means the operation requires a role or permission your account does not hold.

Generation kickoff endpoints (POST /v1/generations, POST /v1/generations/{id}/update, POST /v1/generations/{id}/retry) emit one specific typed 403 worth branching on:

type Cause What to do
USER_PENDING_APPROVAL The account exists but has not yet been approved. Generation kickoffs are blocked until approval lands. Read access (interview list, generation list, package list) is unaffected. Wait for approval; the API will accept kickoffs once the account is enabled. There is no caller-side action that unblocks this.

404 — not found

The resource does not exist, or it exists but belongs to a different account. SpecStep does not distinguish between "not found" and "found but not yours" — both return 404.

409 — conflict

The request is well-formed but conflicts with the current state of the resource. For example, attempting to start a generation when one is already running, or delivering a package that has already been delivered. Read detail for the specific conflict.

Generation kickoff endpoints (POST /v1/generations, POST /v1/generations/{id}/retry) emit additional typed 409 values worth branching on:

type When it fires What to do
FEATURE_NOT_ALLOWED The intake the caller is generating from uses External Connector reference documents (folders attached via OneDrive / SharePoint / Google Drive), but the caller's tier doesn't include the External Connectors feature. Free tier may connect a folder + watch the Interviewer summarise it, but cannot run a generation that consumes connector-sourced reference docs. Upgrade to Pro or Team, or remove the connector-sourced documents from the intake before kicking off.
CONCURRENCY_CAP_REACHED The per-scope concurrency cap is currently saturated — the caller already has the maximum number of generations running. Wait for one of the in-flight runs to terminate, or back off and retry. The MCP validate_generation_request tool surfaces CONCURRENCY_AT_CAP / CONCURRENCY_HIGH warnings ahead of this 409 firing.

The retry endpoint — POST /v1/generations/{id}/retry — returns one of four specific type values when it fires a 409:

type When it fires What to do
https://specstep.com/errors/retry_state_invalid The target generation is not in Failed state — it's still running, already succeeded, or was cancelled. Wait for the run to finish; only Failed rows are retryable. The current state is included in detail.
https://specstep.com/errors/retry_researcher_child The target is a generation spawned as part of a Researcher run. Re-fire the parent Researcher run from the original interview instead — individual entries from a Researcher run can't be retried in isolation.
https://specstep.com/errors/retry_envelope_unavailable The original kickoff command was not persisted (legacy rows that predate the persisted-command feature). Restart the work from the original interview; the row can't be replayed in place.
https://specstep.com/errors/retry_owner_mismatch The caller has view-all access to the row but is not its original owner. Only the original owner can retry; view-all access doesn't grant retry rights.

All four come back as application/problem+json with status 409, a title matching the uppercase code (e.g. RETRY_STATE_INVALID), and a detail describing the specific row.

The pause/resume/cancel endpoints — POST /v1/generations/{id}/pause, POST /v1/generations/{id}/resume, POST /v1/generations/{id}/cancel — return their own 409 codes when the requested transition is illegal for the generation's current state.

type When it fires What to do
CONFLICT_STATE_TRANSITION The generation state machine rejected the transition: pause from a non-pausable state, cancel on an already-terminal row (Complete / Failed / Cancelled), or a resume rejected by the underlying state machine. Re-read the generation to see its current state. Terminal rows can't be moved; in-flight rows must reach a pausable state first. The title reflects the endpoint that was called: Pause not allowed in current state, Cancel not allowed in current state, or Resume rejected by state machine. detail names the specific transition.
RESUME_FROM_NON_PAUSED POST /v1/generations/{id}/resume was called on a generation that is not in Paused state. Resume is only valid from Paused. Check the generation's state first; only call resume on Paused rows. title is Resume requires the generation to be in Paused state; the current state is included in detail (e.g. Generation {id} is in state {state}; cannot resume.).
RESUME_NO_PRIOR_STATE POST /v1/generations/{id}/resume couldn't determine a target state — no pre-pause state was recorded for this generation. Rare; usually means the row was created in Paused somehow, or its history was pruned. Don't retry; the generation can't be resumed. Restart from the original interview.

The documentation-migration commit endpoint — POST /v1/doc-migrations/commit (and the MCP commit_doc_migration tool) — returns its own 409 when the upload can't be placed unambiguously:

type When it fires What to do
DOC_MIGRATION_UNRESOLVED_CONFLICTS Two or more source files in the uploaded archive map to the same canonical slot, so the commit can't decide which one wins. Run the preview first (POST /v1/doc-migrations/preview or preview_doc_migration) to see the conflicting_target_paths, then re-submit with target_path_overrides mapping each conflicting source path to a distinct canonical target.

Note for parser-writers: the retry codes above use full URL type values (e.g. https://specstep.com/errors/retry_state_invalid); the pause/resume/cancel codes use bare uppercase strings (e.g. CONFLICT_STATE_TRANSITION). Both are emitted as-is — match on a normalized form if you need a single switch.

POST /v1/interviews/{id}/turns — when an Idempotency-Key collides with an in-flight submission, you get 409 with INTERVIEW_TURN_IN_FLIGHT, a Retry-After: 5 header, and the structured fields described in Structured error data (REST). When the original submission failed, retries with the same key get the cached error code (e.g. INTERVIEW_TURN_TIMEOUT) with replayed_from_cache: true in the extension data.

Structured error data (REST)

ProblemDetails responses on the interview-turn surface carry retry hints as extension fields on the body. The fields are stable across error codes:

Field Type Meaning
error_code string UPPER_SNAKE_CASE code identifying the failure (e.g. INTERVIEW_TURN_IN_FLIGHT, INTERVIEW_TURN_TIMEOUT).
retryable bool true when re-submitting (after any Retry-After wait) is safe.
turn_committed bool false when the user's turn was NOT persisted (safe to re-submit fresh). The interview-turn handler guarantees this is false for any pre-commit failure.
retry_after_seconds int Suggested wait before retrying. Mirrors the Retry-After HTTP header.
original_error_code string Present on idempotency replays of failed rows — the original failure's code.
replayed_from_cache bool Present on idempotency replays of failed rows — flags that this response is a cached failure, not a fresh attempt.

Structured error data (MCP)

MCP JsonRpcError responses carry the same retry hints via the spec's data field. The data object always includes error_code and any of the fields above that apply. For example, an INTERVIEW_TURN_IN_FLIGHT error returns:

{
  "jsonrpc": "2.0",
  "id": 42,
  "error": {
    "code": -32000,
    "message": "INTERVIEW_TURN_IN_FLIGHT: A submission for interview <uuid> with this client_request_id is already being processed; retry the SAME id after a short wait.",
    "data": {
      "error_code": "INTERVIEW_TURN_IN_FLIGHT",
      "retryable": true,
      "retry_after_seconds": 5,
      "turn_committed": false,
      "client_request_id": "...",
      "interview_id": "..."
    }
  }
}

The catch-all -32603 Internal error is emitted without data — clients can't infer commit semantics from arbitrary unhandled exceptions, so we don't guess. Treat bare -32603 as "unknown state; safest to call get_interview(interview_id) to discover what actually happened."

Interview-turn error codes

Surfaces: POST /v1/interviews/{id}/turns (sync + async), GET /v1/interviews/turns/{jobId}, MCP submit_interview_turn (sync + async), MCP get_interview_turn_status. The error_code on a failed job's status, or in the data payload of an interview-turn error envelope, is one of:

Code Meaning is_retryable What to do
INTERVIEW_TURN_IN_FLIGHT A submission with this client_request_id is currently being processed. true (wait first) Wait retry_after_seconds (default 5) and retry with the SAME id.
INTERVIEW_TURN_TIMEOUT The LLM call timed out. true Retry — typically transient.
INTERVIEW_TURN_TRANSPORT_ERROR Network / HTTP error reaching the LLM provider. true Retry — typically transient.
INTERVIEW_TURN_STUCK_INFLIGHT A janitor sweep recycled a submission that had been InFlight too long (handler / process crashed mid-call). true Retry — same key is safe; the row was recycled.
INTERVIEW_TURN_STUCK_RUNNING A janitor sweep recycled an async job whose worker died mid-LLM. true Retry.
INTERVIEW_TURN_INTERNAL_ERROR An unclassified server-side error. false Do not retry blindly; surface to the user / check status.
INTERVIEW_TURN_CANCELLED The submission was cancelled by the caller via cancel_interview_turn. Surfaced when a retry with the original client_request_id arrives after the cancel landed. true (new id) Retry with a fresh client_request_id.
INTERVIEW_TURN_NOT_CANCELLABLE cancel_interview_turn was called against a job in Completed or Failed state. false No retry — the work landed; read the snapshot instead.
INTERVIEW_NOT_FOUND The interview_id doesn't exist or is foreign-owned. false Re-check the id; foreign ids are info-hidden.

429 — too many requests

The rate limit for this endpoint has been exceeded. Read the Retry-After header and wait that many seconds before retrying. See rate limits for the full retry protocol.

{
  "type": "RATE_LIMITED",
  "title": "Too many requests",
  "status": 429,
  "detail": "Rate limit exceeded; retry in 14s.",
  "retry_after_seconds": 14
}

500 — internal server error

Something went wrong on SpecStep's end. The detail field will contain a reference identifier you can include in a support ticket. Do not assume the operation succeeded — check the resource's state before retrying.

502 — bad gateway

An upstream LLM provider (Anthropic, OpenAI, or your bring-your-own provider) returned an error or timed out while servicing the request. The detail field describes the failure mode; the request is safe to retry once after a short wait, since SpecStep's own state was not modified.

The type on a 502 is one of the provider error codes below. Branch on it when you want to distinguish "wait and try again" from "the request itself is wrong."

503 — service unavailable

The only 503 in the API today is EXPLAIN_TIMEOUT — fired when a POST /v1/packages/{id}/explain cold call exceeds its ~75s wall-clock budget. This is SpecStep's own budget, not a network or gateway timeout — distinct from a 504 and retry-friendly. No cost is incurred when it fires, so a retry is safe. Branch on type; detail is advisory prose, not a stable string to match on.

{
  "type": "EXPLAIN_TIMEOUT",
  "title": "EXPLAIN_TIMEOUT",
  "status": 503,
  "detail": "The explanation took longer than 75s to generate. This usually means the upstream LLM is slow right now. Try again — most calls succeed within 30s."
}

Provider errors

These type values appear on 502 responses (and on the failure_reason of Failed generations whose underlying cause was an LLM provider). Use them to decide whether to retry, rotate a key, or fix the request.

type Cause Retryable?
PROVIDER_RATE_LIMITED Provider returned 429 or equivalent. SpecStep honors the provider's Retry-After (capped at 5 minutes). Yes — wait, then retry.
PROVIDER_TRANSIENT Provider returned 5xx, timed out, or the connection reset mid-stream. Yes — exponential backoff.
PROVIDER_REQUEST_REJECTED Provider returned a deterministic 4xx — invalid request payload, unsupported model, malformed schema/tool params, content too long, etc. The request shape itself is wrong. Added 2026-05-09; previously these surfaced as PROVIDER_TRANSIENT and were retried. No — fix the request.
PROVIDER_CONTENT_REFUSED Provider declined on content-policy grounds. No — adjust the input.
PROVIDER_AUTHENTICATION_FAILED Provider rejected the API key. For bring-your-own-key callers, rotate the key in your provider console and update it in SpecStep settings. No — rotate the key.
PROVIDER_SCHEMA_VIOLATION Provider returned output that did not validate against the expected schema beyond the corrective retry budget. No — usually a model-quality issue; try a different model or simplify the request.

Retryable vs non-retryable

Status Retryable? Notes
400 No Fix the request first.
401 No Fix the key or wait out the throttle window.
402 No (until reset) Wait for X-Quota-Reset, switch profile, or upgrade.
403 No A different key or role is required.
404 No The resource is not there.
409 Depends Read detail — some conflicts resolve when state changes.
429 Yes Wait for Retry-After seconds, then retry.
500 Yes — with care Check resource state before retrying to avoid double-execution.
502 Yes — with care Upstream provider failure; safe to retry, but back off if it persists.

Generation failure categories

GET /v1/generations/{id} responses include a typed failure_category field on terminal Failed rows — see REST walkthrough — failure categories for the full enum and the additive contract. It is not an HTTP error code; it describes why a 200 response shows state: "Failed".

Note that ReviewBudgetExhausted, RedraftNoProgress, ReviewLoopStalled, and CostBudgetExceeded are all content-quality / convergence outcomes, not transient system faults. Unlike NetworkTimeout or HostRestart, retrying the same intake at the same review profile produces the same result — adjust the intake, raise the review profile, or raise the cost cap before re-running.

The accompanying failure_reason string is sanitized — normalized to a short stable classifier (provider rate-limit, auth failure, transient, and similar) rather than echoed verbatim from the upstream provider. Raw provider messages can leak provider internals or request fragments containing PII, so the API never surfaces them. Treat failure_category as the primary programmatic surface; failure_reason is a human-readable hint, not a parseable contract — the exact strings are not stable. This sanitization applies wherever failure_reason appears: REST GET /v1/generations/{id}, the MCP get_generation and wait_for_generation tools, and webhook delivery bodies.

Endpoint-specific validation

These sections cover error semantics for endpoints whose validation logic is too specific to fold into the global "Status codes" table above. The canonical happy-path docs live on the feature pages — see REST walkthrough for the kickoff / clarification / delivery flows and Webhooks instead of polling for the dispatcher contract.

Mid-generation clarification

GET /v1/generations/{id}/clarifications and POST /v1/generations/{id}/clarifications/answers (added 2026-05-05) have their own validation cases worth calling out:

Status Cause What to do
400 The generation isn't in PausedAwaitingClarification and the caller still tried to submit answers. Re-fetch GET /v1/generations/{id} and confirm the state. The clarifications GET on a non-paused row returns 200 with an empty array — branch on that instead.
400 The submitted answer set doesn't cover every pending clarification (all-or-nothing for v1). The missing questions are listed in detail. Refetch GET /v1/generations/{id}/clarifications to see the full list, then resubmit with answers for every one.
400 An individual answer carries a blank question or answer string. Provide non-blank values for every entry.
404 The generation isn't yours, or doesn't exist. Same not-found semantics as the rest of the generations surface.

The pairing key between submitted answers and pending clarifications is the verbatim question string. Don't paraphrase or trim — copy the question text from the GET response unchanged. Answers whose question text doesn't match a pending clarification are accepted but ignored, which is rarely what callers intend.

Bug-report submission

POST /v1/bug-reports emits typed validation codes alongside the standard problem-details shape. All fire as 400 Bad Request.

type Cause What to do
BUG_REPORT_TITLE_REQUIRED Missing or whitespace-only title. Send a non-empty title.
BUG_REPORT_DESCRIPTION_REQUIRED Missing or whitespace-only description. Send a non-empty description.
BUG_REPORT_INVALID A field exceeded its length bound (title 200, description 8000, current_route 500, user_agent 500). The paramName extension carries the field name. Truncate the offending field.

GET /v1/bug-reports/{id} follows the same not-leak posture as the rest of the by-id surface: a foreign caller gets 404, not 403. The submitter sees the report; everyone else sees the same response a non-existent id would produce.

Feedback submission

Distinct from bug reports — feedback evaluates quality (was the interview good, is the package coherent), not broken behavior. See REST walkthrough — Quality feedback for the endpoint reference.

POST /v1/feedback emits typed validation codes in the standard problem-details shape. All fire as 400 Bad Request.

type Cause What to do
FEEDBACK_TITLE_REQUIRED Missing or whitespace-only title. Send a non-empty title.
FEEDBACK_FULL_REPORT_REQUIRED Missing or whitespace-only full_report. Send a non-empty report body.
FEEDBACK_INVALID A field exceeded its length bound (title 200, summary 1000, full_report 32000) or violated a domain rule. Common cases: a run-bound type (interview_quality, package_quality, end_to_end_run) submitted with no target GUID; a numeric score outside 0–100; rubric_version, rubric_section_responses, or rubric_scores set without template_id. The paramName extension carries the field name. Re-check the field against the schema in REST walkthrough — Quality feedback and the Submit feedback MCP tool reference.
FEEDBACK_TEMPLATE_VERSION_REQUIRED template_id was set but rubric_version was missing. Pair the two: template_id + rubric_version together, or both absent. The reverse pair (rubric_version without template_id) surfaces as FEEDBACK_INVALID.
FEEDBACK_TEMPLATE_UNKNOWN The (template_id, rubric_version) pair doesn't resolve in the catalog. Fetch GET /v1/feedback/templates and pick a known pair, or omit both fields for free-form feedback.
FEEDBACK_TEMPLATE_TYPE_MISMATCH The template_id resolves, but it's paired with a different feedback type than the one you sent (e.g. interview-quality submitted with a type other than interview_quality). Each rubric maps 1:1 to a type so the triage type filter lines up with the rubric's sections. Use the template whose Pairs with type column matches your type (see the template table in REST walkthrough — Quality feedback), or omit template_id for free-form feedback.
FEEDBACK_TEMPLATE_SECTION_UNKNOWN A key in rubric_section_responses isn't a section id on the chosen template. Fetch GET /v1/feedback/templates/{id}/{version} to see the section ids, then drop unknown keys.
FEEDBACK_TEMPLATE_SCORE_UNKNOWN A key in rubric_scores isn't a scored section (a section with a non-null score_scale) on the chosen template. Only score sections whose template entry has score_scale set; leave the others to the free-text rubric_section_responses.

GET /v1/feedback/{id} returns 404 for foreign callers, without distinguishing "doesn't exist" from "isn't yours."

Feedback amendment

PATCH /v1/feedback/{id}/amend lets the original submitter self-correct free-form content while the row is still Open and within the amend window (10 minutes of submission). See REST walkthrough — Quality feedback for the field list. Both codes below fire as 400 Bad Request; a non-owner gets 404 (existence isn't leaked), not a typed code.

type Cause What to do
FEEDBACK_AMEND_NOT_OPEN The row has already left Open (review has started), so it can no longer be self-amended. Amendment is producer-only and Open-only; once a row leaves Open, it's locked for self-correction.
FEEDBACK_AMEND_WINDOW_EXPIRED More than the amend window (10 minutes from submission) has elapsed. The window has closed; the row is locked for self-correction.

Webhook receivers

Webhook receivers run on your infrastructure, but the dispatcher's behavior shapes what you should return:

  • 2xx — delivery considered successful. SpecStep stops retrying and stamps the subscription with the success status + HTTP code.
  • 4xx — terminal subscriber rejection. SpecStep does NOT retry — the next event will be attempted, but this delivery is dropped. Use 4xx deliberately when you want SpecStep to stop hammering a misconfigured receiver.
  • 5xx or transport failure — retried up to 3 times with exponential backoff (1s, 4s, 16s). After the final failure the delivery is logged + dropped (no DLQ in v1).

The X-SpecStep-Webhook-Delivery header is your dedup key. The same delivery id may arrive more than once if the dispatcher retries past your 200 response (network race), so design idempotently.

Reference document upload validation

POST /v1/interviews/{interviewId}/reference-documents rejects malformed uploads at the endpoint boundary before the service layer runs. All three cases return 400 Bad Request with the standard problem-details shape; the title is the stable signal.

title Cause What to do
File is empty. The multipart part had zero bytes. Send a file with content and retry.
File too large. The upload exceeded the 64 MB endpoint hard cap. Per-tier limits are tighter and enforced after this check passes. Send a smaller file. The endpoint cap is the architectural ceiling, not the per-tier quota — your tier's limit may be lower.
Content-type mismatch. The file's leading bytes don't match the declared Content-Type header — e.g. an HTML or executable payload sent as image/png. Added 2026-05-12 to close an attack on the LLM image-hydration path. The detail names the detected family vs the declared MIME. Re-upload with the correct content type. Files whose magic bytes are unknown to the sniffer pass through; the existing extension allowlist still applies downstream.

The supplied filename is sanitized server-side rather than rejected: directory components are stripped (path-traversal mitigation), characters outside [A-Za-z0-9._- ] are replaced with _, leading dots are removed, and the result is capped at 200 characters. Callers see the sanitized name in the upload response and in audit logs — there is no error path for filename normalization.