REST guide · API docs

This page walks through the full end-to-end flow: from verifying your credentials to receiving a delivered documentation package. Each step lists the endpoint and what to expect back.

For reference material — status codes, error shapes, rate limits — see errors and rate-limits. For the OpenAPI document, fetch GET /v1/openapi.json.

Step 1 — verify your credentials

Before doing anything else, confirm your key works and retrieve your account details:

GET /v1/me

Returns your user ID, display name, email, and current plan. If this returns 401, stop and fix your key before proceeding. See authentication.

/v1/* endpoints accept either bearer shape on the Authorization header — an sf_… API key or an oat_… OAuth access token issued via the MCP browser sign-in flow (see mcp). The full auth reference is in auth.

Step 2 — start an interview

An interview is the structured conversation that captures your project's vision, requirements, and constraints. The generation that follows draws entirely from what the interview recorded.

POST /v1/interviews

Takes no request body. The response includes the new interview id, its initial status, and the AI Team's opening turn. You describe what you're building — project type, vision, constraints — in your first POST /v1/interviews/{id}/turns call (Step 3 below); the interview's project type is inferred from that first turn.

You can also attach reference documents — PDFs, images, YAML configs — to give the AI Team additional context. Upload, list, and delete them through the collection:

POST /v1/interviews/{interviewId}/reference-documents — upload (multipart form)

GET /v1/interviews/{interviewId}/reference-documents — list

DELETE /v1/interviews/{interviewId}/reference-documents/{documentId} — remove

Step 3 — submit interview turns

The interview proceeds as a sequence of turns. Each turn you submit becomes part of the recorded conversation. The AI Team responds within the same turn record.

POST /v1/interviews/{id}/turns

Send {"message": "..."}. The response includes the agent's reply and the updated interview state. Keep submitting turns until the AI Team has captured enough context to proceed to generation; you finalize the interview with:

POST /v1/interviews/{id}/complete

Safe retry (recommended). Send an Idempotency-Key HTTP header on every POST /v1/interviews/{id}/turns call. Any unique token works (UUID, ULID, hash of (interview_id, message) — 1..128 chars of [A-Za-z0-9._:-]). On a retry with the same key, you receive the original successful result instead of double-submitting your message. While the original is still in flight, the retry returns 409 Conflict with a Retry-After: 5 header and a structured ProblemDetails body containing retryable: true, retry_after_seconds: 5, turn_committed: false, and client_request_id. See errors for the full envelope.

Async mode is the default (changed 2026-05-19). A bare POST /v1/interviews/{id}/turns (no ?mode= query, no X-SpecStep-Turn-Mode header) now commits your user turn immediately and runs the LLM call in the background. The response is 202 Accepted with a Location: /v1/interviews/turns/{jobId} header and a body like {status: "queued", job_id, interview_id, submission_id?, user_turn_committed: true, snapshot: null}. Poll the location for the agent's reply:

GET /v1/interviews/turns/{jobId}

Returns {status, job_id, interview_id, snapshot?, error_code?, error_message?, is_retryable?, created_at, completed_at?}. The status is one of queued, running, completed, failed. When completed, snapshot carries the full interview state. When failed, error_code tells you what went wrong (INTERVIEW_TURN_TIMEOUT, INTERVIEW_TURN_STUCK_RUNNING, INTERVIEW_TURN_INTERNAL_ERROR, …) and is_retryable tells you whether to re-submit. Idempotency-Key works on async too.

If you supplied an Idempotency-Key whose original async call has already completed, you instead get 200 OK with status: "cached_replay" and the cached snapshot inline — no polling required.

Sync mode (opt-in, scheduled for removal). Pass ?mode=sync (or X-SpecStep-Turn-Mode: sync) to use the legacy inline-reply path: the call waits for the LLM round-trip and returns the agent's reply inline as 200 OK with the full interview snapshot. Subject to the ~60s Front Door ceiling — long interviews may 504. After one release cycle of observed async adoption, sync mode is removed entirely. New integrations should use async (the default).

Cancel an in-flight async turn (added 2026-05-18). If the user's submitted turn was wrong, an LLM call is dragging on, or you want to abandon a half-finished turn rather than wait for it (or for the stuck-job timeout), cancel the job by id:

POST /v1/interviews/turns/{jobId}/cancel

Returns 200 OK with {status: "cancelled", job_id, interview_id, created_at, completed_at} on the happy path. Queued jobs cancel cleanly; running jobs cancel best-effort — the job's terminal status will be cancelled, but the agent reply MAY still appear in the interview transcript if a mid-pipeline SaveChanges committed before the cancel landed. Idempotent on already-cancelled jobs. Returns 409 INTERVIEW_TURN_NOT_CANCELLABLE when the job is already completed or failed (the work landed; read the result via the poll endpoint above). 404 if the job isn't yours.

Completion auto-handoff (added 2026-05-17). When the agent signals completion on a turn — the interview transitions to complete and an intake_artifact_id is produced — SpecStep auto-starts a generation with sensible defaults (review_profile: "Normal", has_ui derived from the detected project type) and surfaces the result on the same snapshot. Every package ships the full set of AI-coder instruction files (CLAUDE.md, AGENTS.md, .cursorrules, .github/copilot-instructions.md):

started_generation_id — non-null on success; poll GET /v1/generations/{id} for progress.
auto_start_failure: {code, message} — non-null when auto-start failed (quota exceeded, validation error, transient provider failure, etc.). The interview turn still succeeded; the intake artifact is committed. Call POST /v1/generations manually with the intake_artifact_id if you want to retry with custom settings.

Both fields stay null when the turn didn't trigger completion. Auto-handoff is restricted to user-actor interviews; API-key actors receive auto_start_failure.code: "AUTO_START_NOT_SUPPORTED_FOR_ACTOR_TYPE" and call POST /v1/generations themselves. Same fields appear on the snapshot returned by GET /v1/interviews/turns/{jobId} when the async job's completion produced an intake artifact.

You can retrieve the full interview at any time:

GET /v1/interviews/{id}

Every interview read carries a transcript_size introspection block (added in v0.18, 2026-05-22) so clients can observe how full a transcript is before queuing the next turn:

"transcript_size": {
  "chars": 8421,
  "tokens_estimate": 2105,
  "max_chars": 800000,
  "max_tokens": 200000,
  "percent_used": 1.1
}

chars — total UTF-16 char count of every user + agent turn's content. System prompts, tool messages, and reference documents are excluded.
tokens_estimate — chars / 4, floored. Conservative upper bound across Anthropic + OpenAI tokenizers; not a substitute for the real tokenizer count.
max_chars / max_tokens — current platform ceilings. Surfaced only in v0.18; not enforced yet. A later release will reject submit-turn calls that would push a transcript past the ceiling with a structured error envelope.
percent_used — 100.0 * chars / max_chars, rounded to one decimal. Can exceed 100 once the ceiling is enforced and a stale client has built up an oversize transcript on the side.

The same block appears on get_interview MCP responses with byte-identical field names.

You can also list your own interviews:

GET /v1/interviews?status=...&limit=20

Optional status filter (comma-separated tokens — active, paused, abandoned, complete, awaiting_clarification, or exact enum names) and limit (default 20, max 100). Empty conversations (fewer than two turns) are filtered out so first-contact abandons don't clutter the list. Each row carries id, status, detected_type, display_title, turn_count, started_at, and last_activity_at.

When you want an interview gone from the workspace — accidentally started, no longer relevant — soft-delete it:

DELETE /v1/interviews/{id}

Returns 204 No Content on success and 404 if the interview isn't yours. Allowed in any status (Active, Paused, Complete, Abandoned, AwaitingClarification, ClarificationResolved) — soft-delete is a "remove from my workspace" affordance, not a state-machine transition. The conversation row stays in the database for audit; the row drops out of GET /v1/interviews and the workspace UI. Idempotent: a re-delete on an already-deleted row is also 204.

To recover a deleted interview, see the recycle bin below.

Step 3.5 — discover the enumerable inputs (optional)

Before constructing a generation kickoff, you can ask the API which review profiles, project types, mirror selections, and schema versions it accepts. This lets clients avoid hardcoding magic strings that shift between deploys.

GET /v1/capabilities

Anonymous-shaped — the values describe the public contract and don't depend on the caller. Returns {schema_version, rubric_version, quality_rubric_version, review_profiles, project_types, mirror_selections}. Useful for SDKs and AI agents building dynamic forms.

Step 3.6 — connect an external folder (optional)

Instead of uploading reference documents one at a time, connect a folder from OneDrive, SharePoint, or Google Drive (Dropbox coming) and have its files synced into the interview. Re-syncing later is a single call.

This surface is cookie-only by design — every /v1/external-connectors/* route requires a session cookie, and the OAuth callback rides on the user's browser session. API-key callers cannot register a connector or run a sync; programmatic agents should treat connectors as read-only state and rely on a human to set them up through the UI.

The OAuth handshake is three stages — the UI drives all three, but the contracts are documented here so an SDK or AI agent embedded in the web shell can replicate the flow:

POST /v1/external-connectors/{provider}/authorize with body {interviewId} — returns {authorize_url, state}. Redirect the browser to authorize_url.
GET /v1/external-connectors/{provider}/oauth-callback?code=...&state=... — the provider redirects the browser here. The server exchanges the code and redirects to /interview/{interviewId}?pendingConnector={pendingId} so the UI can open a folder picker. The state token is one-shot — replaying it returns 400. Pending tokens expire after 15 minutes.
POST /v1/external-connectors/pending/{pendingId}/commit with body {folderId, folderName} — registers the connector and runs the first sync. Returns {connector_id, files_synced, files_skipped}.

To populate the picker between stages 2 and 3:

GET /v1/external-connectors/pending/{pendingId}/folders?parent={id} — list folders one level at a time. Returns {folders: [{id, name, has_subfolders}, ...]}. Omit parent for the root.

{provider} accepts onedrive, microsoft-graph, sharepoint, google-drive (coming), and dropbox (v1.5). Kebab-case is what the UI sends.

Once committed, the steady-state surface:

GET /v1/external-connectors — list your connectors. Returns {connectors: [{id, provider, folder_name, status, created_at, last_synced_at}, ...]}. status is one of Active, Revoked, NeedsReconnect; provider is OneDrive, SharePoint, or GoogleDrive.

POST /v1/external-connectors/{connectorId}/sync with body {interviewId} — re-sync the connected folder into the named interview. Returns {files_synced, files_skipped}. 404 if the connector isn't yours.

DELETE /v1/external-connectors/{connectorId} — revoke. Returns 204 on success, 404 if not yours.

Step 3.7 — list intake artifacts (optional)

Once an interview is complete, it produces an intake artifact — the structured document that POST /v1/generations consumes. You can list these directly without filtering interview status inline:

GET /v1/intake-artifacts?status=ready&limit=50&offset=0

Optional status is ready (the only meaningful value today; null/blank = same as ready; unknown labels return an empty list). limit defaults to 50, max 200. Returns {artifacts: [{id, interview_id, project_name, schema_version, completed_at}, ...]}, newest first. Use the returned id as the intake_id argument to POST /v1/generations.

Step 4 — start a generation

Once the interview is complete, start a generation. This is the multi-stage process where the AI Team drafts, self-reviews, and (depending on the review profile) performs a fresh-eyes pass.

POST /v1/generations

The body uses intake_id (the intake artifact derived from the completed interview), the review profile, and a few configuration fields. Minimum required body:

{
  "intake_id": "01952fcb-cd11-7c3e-9a2e-3b1d8f5e6a04",
  "review_profile": "Normal",
  "project_type": "WebApp"
}

The response includes a generation id and a starting state of Queued. Defaults: review_profile is Normal, schema_version is 1.0.0. Every package ships the full set of AI-coder instruction files (CLAUDE.md, AGENTS.md, .cursorrules, .github/copilot-instructions.md); mirror_selection is still accepted on the wire for back-compat but no longer narrows the output. Accepted review profiles are Fast, Normal, Extensive, and Researcher (case-sensitive). Accepted project_type values are WebApp, MobileApp, MobileGame, DesktopApp, BrowserExtension, AiAgent, and AiTool — call GET /v1/capabilities for the live list rather than hardcoding them. The generation runs asynchronously — you will need to poll for its progress.

Enum values (review_profile, project_type, mirror_selection) are case-sensitive strings on the wire. Callers sending integer ordinals continue to work for back-compat, but new code should send strings — the OpenAPI spec at /v1/openapi.json advertises the string form so codegen clients pick it up automatically.

Note: this endpoint is subject to the tighter 5-kickoffs-per-minute limit. See rate limits.

If you want to update an existing generation based on a new interview snapshot, use:

POST /v1/generations/{id}/update

This also counts as a kickoff for rate-limit purposes.

Step 5 — poll generation status

Generations take time — roughly 4 minutes for Fast, 12 for Normal, 32 for Extensive (p50 across recent completed runs). For a live pre-kickoff forecast tied to your account's history, use the MCP tool estimate_generation_cost — there is no REST estimate endpoint. After kickoff, REST callers read the run's own forecast from the estimated_total_usd / estimated_total_p25_usd / estimated_total_p75_usd and estimated_duration_seconds / estimated_time_remaining_seconds fields on the status response below. Poll the status endpoint to track progress:

GET /v1/generations/{id}

The response includes a state field. Poll until it reaches Complete, Failed, or Cancelled. A reasonable polling interval is 15 to 30 seconds. For richer detail about what each stage is doing, fetch:

GET /v1/generations/{id}/events

For a per-agent narration of what's happening live — used by the Generation Details page's chat-feed view — fetch:

GET /v1/generations/{id}/conversation?take=...

Returns {items: [...]} where each entry is one AgentInvocation projected for display: invocation_id, agent_role, character_name, display_name, asset_slug, accent_hex, action, narration, status, started_at, ended_at, duration_ms, cost_usd. Optional take caps the most-recent N; omit for the full feed.

You can also list every generation on your account, which is useful when a generation never produced a Package row (failed mid-flight, paused, or cancelled) and so doesn't show up under /v1/packages:

GET /v1/generations?status=...&limit=50&offset=0&order=desc

Optional status accepts comma-separated tokens. Roll-up tokens collapse to canonical state sets: in_progress (Queued / Drafting / SpecialistReview / Reviewing / FreshEyes / RiskReview / SecurityReview / Assembling / Delivering), complete, failed, cancelled, paused. You can also pass exact state names (Drafting, Reviewing, etc.) — case-insensitive. order is desc (newest first, default) or asc. Each row includes id, short_id, project_name, state, review_profile, cost_usd, the timing stamps, current_round, failure_reason / failure_category, source_channel, and interview_id.

When you want a generation gone from the workspace — e.g. a private experiment you don't want to keep visible — soft-delete it:

DELETE /v1/generations/{id}

Returns 204 No Content on success, 404 if the generation isn't yours, and 409 Conflict if the generation is still in flight (delete is only allowed on terminal-state rows: Complete, Failed, Cancelled — cancel the run first). The row is hidden from list and detail responses but kept on disk for audit + retention. Idempotent on already-deleted rows.

To recover a deleted generation, see the recycle bin.

Generation states

State	Meaning
`Queued`	Waiting to start.
`Drafting`	Recommender, Architect, and Designer Critic produce sections.
`SpecialistReview`	Domain-specialist agents pass over their owned sections (Hush for privacy, Argus for security, Marc for domain, etc.). Fires between drafting and the main review loop.
`Reviewing`	Same-provider Critic reviews per round budget.
`FreshEyes`	Different-provider Critic reviews (Normal / Extensive).
`RiskReview`	The risk-review specialist passes over the package before assembly. Surfaces blocking + non-blocking risk callouts.
`SecurityReview`	The security-review specialist passes over the package before assembly. Surfaces blocking + non-blocking security findings.
`Assembling`	The orchestrator stitches the section drafts + per-agent metadata into the package zip.
`Refining`	Pre-delivery auto-refinement between assembly and delivery: fills referenced-but-missing docs (or drops dangling references), reconciles cross-document contradictions, and resolves or escalates residual blockers. Always advances to `Delivering` regardless of any residual gaps — what it did is recorded per-pass in `refinement_summary` / `reconciliation_summary` / `blocker_resolution_summary` and consolidated into `refinement_audit`. A run with nothing to refine skips this state entirely and goes straight from `Assembling` to `Delivering`, so a missing `Refining` is normal, not an error.
`Delivering`	The package is being committed to its delivery target (GitHub repo, signed blob URL).
`Paused`	Generation paused; call resume to continue.
`PausedAwaitingClarification`	A creating agent flagged an ambiguity it can't safely resolve and the orchestrator paused for the user to answer. See step 5.6.
`Complete`	Generation finished; package is ready.
`Failed`	Generation stopped; check `detail` for the reason.
`Cancelled`	Stopped by request.
`AddendumRunning`	A change-request addendum is replaying against a previously-completed package. Parent row stays `Complete`; this marker rides on the addendum-child generation.

Generation response fields

GET /v1/generations/{id} returns a JSON body with these fields. New since 2026-05-05: project_name, description, kind, and kind_label so callers can identify what each generation is about and disambiguate the deliverable from runnable code.

Field	Notes
`id` / `intake_id`	Stable UUIDs.
`state` / `review_profile` / `current_round`	Pipeline state, profile, current review round.
`started_at` / `completed_at` / `failed_at`	UTC timestamps.
`failure_reason` / `failure_category`	Populated only on `Failed` rows. `failure_reason` is a sanitized, human-readable hint — provider-side errors are normalized to a short stable classifier (e.g. provider rate-limit, auth failure, transient) rather than echoing raw upstream messages, so it won't leak provider internals or PII. The exact strings are not a parseable contract; branch on `failure_category` for programmatic handling.
`schema_version` / `rubric_version` / `manifest`	Pinned at kickoff.
`running_cost_usd`	Live cost estimate (USD). Settles to the package's `total_cost_usd` on `Complete`.
`billing_state`	Authoritative work / billing posture — `NotStarted`, `Active`, `PausedRetrying`, `Complete`, or `PausedAwaitingInput` (added 2026-06-01). Distinguishes "running and earning its cost" (`Active`) from "paused on a transient error and burning nothing" (`PausedRetrying`) from "paused waiting on you — your turn, nothing is stuck" (`PausedAwaitingInput`, e.g. answering a clarification). `null` on generations created before the status-projection rollout (2026-05-18). Pair with `running_cost_usd` to disambiguate "healthy" climbing cost from "runaway."
`started_work_at`	Timestamp of the first transition into active work (`Drafting`). `null` while the generation is still queued. Lets callers compute "how long has this been actively working?" without re-scanning the events stream. `null` on pre-rollout generations.
`phase_detail`	Human-readable phase label derived pure-function from `state` + `current_round` (examples: `"Drafting"`, `"Specialist review (round 2)"`, `"Awaiting your clarification"`). Present on every projection row written after the rollout. `null` on pre-2026-05-18 generations.
`progress_explanation`	One-sentence explanation of what's happening at the current `progress_percent` (e.g., `"Specialists are reviewing the draft in parallel"`). Closes the understanding gap the bare progress integer can't. `null` on pre-rollout generations.
`estimated_duration_seconds`	Historical-median forecast of the run's eventual total wall-clock duration in seconds, keyed by `review_profile`. `null` when the historical sample is too small for a confident forecast (the floor is 5 completed generations in the rolling 30-day window) or on pre-rollout generations.
`estimated_time_remaining_seconds`	Best-effort "expected remaining" computed as `estimated_duration_seconds - elapsed_since_started_work_at`, floored at 0. `null` while queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to `estimating…`).
`estimated_completion_at`	Best-effort wall-clock completion timestamp: `started_work_at + estimated_duration_seconds`. `null` while queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to `estimating…`).
`active_specialist`	During `SpecialistReview` only — slug of the most-recently-completed specialist in the current round (`codd` / `halo` / `tally` / `vera` / `trip` / `merlin` / `polo`). A pragmatic single-value summary of a parallel fan-out. `null` outside `SpecialistReview`, when no specialists have completed yet, or on pre-rollout generations.
`retry_count`	Added 2026-05-19. Count of recoverable LLM-provider retries that have fired during this run (rate-limit / transient 5xx / timeout backoffs). Starts at `0` and only increments — never decreases mid-run. Resets to `0` on a host-restart rewind because the retry counter belongs to a single dispatch attempt. Lets callers tell apart "healthy first attempt" (`0`) from "currently riding out a transient hiccup" (`>0`).
`last_retry_at`	Added 2026-05-19. UTC timestamp of the most recent retry attempt. `null` until the first retry fires.
`next_retry_at`	Added 2026-05-19. UTC timestamp the retry policy is currently waiting for before the next attempt (`last_retry_at + backoff_delay`). `null` between retries — there isn't a pending one. Lets callers display "next retry in X seconds" without guessing the backoff curve.
`recoverable_error_category`	Added 2026-05-19. Typed classifier for the recoverable failure that caused the most recent retry. One of `rate_limit` / `provider_timeout` / `provider_server_error` / `schema_violation` / `other`. Distinct from terminal `failure_category` — that one is set when the run fails for good; this one is set when an LLM call temporarily failed but the retry policy is still covering it. `null` when no retry has fired yet.
`host_restart_resume_count`	Added 2026-05-27. How many times this run was automatically resumed after a host restart (capped at 5). Distinct from `retry_count` — that one is provider-level and resets to `0` on a host-restart rewind, so a run that recovered from several restarts still reads `0`; this one spans the run's whole life and only climbs. A non-zero value is the honest reason the run's `running_cost_usd` or `estimated_total_usd` runs higher than a clean-run forecast: each resume re-runs work — the full-rewind path re-runs Drafting from scratch, while cheaper in-place resumes pick up from a saved checkpoint. Always present (defaults to `0`).
`refinement_summary`	Added 2026-05-29. Outcome of the pre-delivery refinement pass that fills referenced-but-missing docs before a package ships. `null` when the pass didn't run, made no change, and left no gap (the clean common case). When present, an object with: `rounds_used` (how many detect → refine → re-validate rounds ran); `generated_count` / `dropped_count` / `residual_count`; `generated` and `dropped` arrays of `{path, referenced_by[]}` — docs the pass filled with real content vs. dangling references it removed (a dead link is worse than no link); a `residual` array of `{path, referenced_by[], reason}` — references that couldn't be filled and ship as deferred stubs (these are the package's known gaps); and a ready-to-render `summary` string. Mirrors the "Pre-delivery refinements" section in the package's `handoff.md`.
`reconciliation_summary`	Added 2026-05-29. Outcome of the pre-delivery contradiction-reconciliation pass that resolves cross-document architecture contradictions (e.g. one doc says PostgreSQL, another says DynamoDB) before a package ships. `null` when the pass found nothing to reconcile and left no residual contradiction (the clean common case). When present, an object with: `rounds_used` (how many detect → reconcile → re-validate rounds ran); `reconciled_count` / `unresolved_count`; a `reconciled` array of `{category, summary, affected_locations[]}` — contradictions resolved by redrafting the affected docs to agree on one decision; an `unresolved` array of `{category, summary, affected_locations[], reason}` — contradictions that couldn't be reconciled and ship as known gaps, with the reason; and a ready-to-render `summary` string. When the pass reconciles a contradiction, that finding no longer appears in `consistency_findings` either. Mirrors the "Pre-delivery reconciliation" section in the package's `handoff.md`.
`blocker_resolution_summary`	Added 2026-05-29. Outcome of the pre-delivery blocker resolve-or-clarify pass that acts on residual Critic-flagged blockers before a package ships. `null` when there were no residual blockers to act on (the clean common case). When present, an object with: `resolved_count` / `clarified_count` / `residual_count`; a `resolved` array of `{target_section, summary}` — blockers the pass cleared by redrafting the affected section; a `clarified` array of `{target_section, summary, question}` — blockers escalated into a clarification question; a `residual` array of `{target_section, summary, reason}` — blockers that couldn't be resolved and ship as known gaps; and a ready-to-render `summary` string. Mirrors the "Pre-delivery blocker resolution" section in the package's `handoff.md`.
`refinement_audit`	Added 2026-05-31. Consolidated audit of the whole pre-delivery refinement pipeline — one flat view of what it auto-fixed versus escalated, aggregated from the three fields above (`refinement_summary` / `reconciliation_summary` / `blocker_resolution_summary`) so you don't have to union three differently-shaped objects to answer "what did the pipeline change, and what did it give up on." `null` on a clean run where every refinement pass was a no-op (the same common case those three fields collapse to). When present, an object with: `auto_fixed_count` / `escalated_count`; an `auto_fixed` array (the pipeline changed the package) and an `escalated` array (the pipeline surfaced an unresolved gap), each of `{pass, action, target, detail}` — `pass` is `stub-fill` / `reconciliation` / `blocker-resolution`; `action` is `generated` / `dropped` / `reconciled` / `resolved` for auto-fixed rows or `residual-gap` / `unresolved-contradiction` / `clarified` / `residual-blocker` for escalated rows; `target` is the doc path, section, or contradiction category; `detail` is a human-readable summary, reason, or clarification question (may be empty); and a ready-to-render `summary` string. Mirrors the "Refinement audit" section in the package's `handoff.md`.
`progress_percent`	Computed 0–100 progress signal driven by `state` + `current_round`. Always present.
`estimated_total_usd`	Historical-median forecast of the run's eventual total cost. `null` when the historical sample is too small for a confident forecast. From 2026-05-27, on a run that has auto-resumed after host restarts the forecast is widened by `host_restart_resume_count` (each resume re-runs work), so a resume-prone run's estimate reflects the extra cost rather than reading wildly low against the actual.
`estimated_total_p25_usd` / `estimated_total_p75_usd`	25th / 75th percentile cost bounds for the same forecast. Both `null` when `estimated_total_usd` is `null`. Widened on resumed runs alongside `estimated_total_usd`.
`estimated_total_sample_size`	Number of historical generations that contributed to the forecast. `null` when the forecast wasn't computed.
`project_name`	Display name from the generation's override, or extracted from the intake's `projectName` / `name` / `title`. May be `null` when no name can be derived.
`description`	Short intake-derived description (`description` / `summary` / `vision` / `elevator_pitch` / `tagline`), truncated to 280 chars with an ellipsis. May be `null`.
`kind`	Stable constant `"specification"`. Disambiguates the deliverable for callers — packages contain specs (architecture, design, plans), not application code.
`kind_label`	Canonical disambiguation copy: `"Specification package — describes how to build the software, not the application code itself."` Render as-is alongside `kind`.

Failure categories

When state is Failed, the response also includes a failure_category field — a machine-readable counterpart to the prose failure_reason string. The value is null while the generation is non-terminal; legacy Failed rows created before categorization was added carry Unknown. failure_reason is a sanitized short string, not the literal upstream error — branch on failure_category for programmatic handling.

`failure_category`	When it fires
`Unknown`	Legacy or unclassified failure — no specific category was recorded.
`StuckInQueue`	The row was never picked up by a dispatcher worker; the queue sweep auto-failed it.
`HostRestart`	A host restart caught the row mid-flight. Safe to retry.
`OrchestratorCrash`	A non-LLM exception in the runner — zip builder, blob upload, or db write.
`LlmAuthFailure`	The LLM provider returned `401` or `403` — a BYO key was revoked, or the platform key is misconfigured.
`LlmQuotaExceeded`	The LLM provider returned `429` or otherwise signalled a rate-limit / quota cap.
`NetworkTimeout`	A transient HTTP, timeout, or network failure — usually talking to the LLM provider, but also covers other outbound calls inside the runner (blob upload, package assembly).
`ReviewBudgetExhausted`	The review loop completed all rounds but blocking issues remained. This is a content-quality outcome, not a system fault — retrying without changing the intake is unlikely to help.
`RedraftNoProgress`	The orchestrator's redraft-no-progress guard fired: a re-draft round produced section content nearly identical to the prior draft AND the same blocking issues still applied. Continuing would burn another full LLM round on the same complaint. Try a higher review profile or a more concrete intake.
`ReviewLoopStalled`	The Critic's blocking-issue set hasn't shrunk for 3 consecutive rounds. Distinct from `ReviewBudgetExhausted` — this fires earlier, before the budget is gone, when the loop is wasting it on the same complaints.
`CostBudgetExceeded`	The cumulative LLM cost crossed the per-profile cap. Hard backstop in case the convergence guards are slipped. Re-run with a smaller scope or upgrade the cap.
`LeaseExpired`	A dispatcher worker claimed the row, lost its lease (host died, network partition, or pod evicted), and the post-batch sweep auto-failed it. Retryable — the next kickoff will pick up cleanly.
`LlmContract`	The LLM returned content that didn't conform to the orchestrator's expected schema after the retry budget was exhausted. Usually transient; rerunning with a different review profile or a more constrained intake often clears it.

This enum is additive. New categories may appear in future API versions without a major-version bump. Treat unrecognized values as Unknown rather than rejecting the response — a closed-set switch will break when new categories arrive.

Step 5.5 — retry a failed generation

If polling lands on Failed, you can re-run the same intake against a fresh generation aggregate without rebuilding the interview:

POST /v1/generations/{id}/retry

Re-uses the original intake_id, review profile, schema/rubric/quality versions, mirror selection, and reference documents. The original Failed row is preserved for audit; the response is a new generation with its own id. Returns 202 Accepted with Location: /v1/generations/{newId} and a body shaped like the POST /v1/generations response — {id, state, download_url?, package_id?}. Poll the new id as in Step 5.

Quota and concurrency apply exactly as on the initial kickoff: QUOTA_EXCEEDED returns 402 Payment Required with the same X-Quota-Tier / X-Quota-Limit / X-Quota-Used / X-Quota-Reset headers, and CONCURRENCY_CAP_REACHED returns 409. The call also counts against the 5-kickoffs-per-minute limit — see rate limits.

A few cases will refuse the retry with 409: generations spawned as part of a Researcher run can't be retried individually (re-fire the parent Researcher run from the original interview), and generations created before the persisted-command feature can't be replayed this way (restart from the interview). Cross-owner retries are also blocked. See errors for the full list of conflict cases.

Step 5.6 — respond to a mid-generation clarification

A creating agent (Architect, Recommender, or Designer Critic) can flag an ambiguity it can't safely resolve from the intake — "Should the API expose REST endpoints, or render server-side HTML?" — and pause the run. The state moves to PausedAwaitingClarification. Web users see the question in their workspace; API/MCP callers fetch and answer it through these endpoints.

GET /v1/generations/{id}/clarifications

Returns the structured questions for a paused generation. The body is {state, clarifications}. When state is anything other than PausedAwaitingClarification, clarifications is an empty array — callers can poll the same endpoint without branching on state first.

{
  "state": "PausedAwaitingClarification",
  "clarifications": [
    {
      "agent": "Architect",
      "section": "docs/02-architecture/03-api-design.md",
      "question": "Should the API expose REST endpoints, or render server-side HTML forms?",
      "why": "Component design and API design contradict each other; the intake doesn't say which to honor.",
      "proposedDefault": "server-rendered HTML forms"
    }
  ]
}

section may be null for non-section-scoped agents (the Recommender's stack-selection questions, for example). why and proposedDefault are advisory copy the agent supplied — surface them to your user verbatim if you have somewhere to render them.

When one logical issue spans several documents, SpecStep asks it as a single question rather than once per document. Those grouped questions carry an extra coveredSections array listing every affected document path; section holds the first of them. One answer to the question resolves all the listed sections — you don't answer per section. coveredSections is null on single-target questions.

POST /v1/generations/{id}/clarifications/answers

Submit answers and resume the generation. Body {answers: [{question, answer}]}. Match each question exactly to the verbatim text from the GET response — the endpoint pairs answers to pending clarifications by question text. Answers must cover every pending clarification (all-or-nothing for v1).

{
  "answers": [
    { "question": "Should the API expose REST endpoints, or render server-side HTML forms?", "answer": "Server-rendered HTML forms; no REST in v1." }
  ]
}

Returns 202 Accepted with {generation_id, status_url}. The orchestrator picks the run back up on the next dispatcher tick, threads the answers into the next agent call, and re-drafts the originally-stuck section. Poll the status_url to watch the state advance back through Drafting / Reviewing / etc.

400 with a problem-details body fires when the generation isn't in PausedAwaitingClarification or when the answer set doesn't cover every pending clarification (the missing questions are listed in detail). 404 if the generation isn't yours.

Step 5.7 — control a running generation

While a generation is in flight, three control endpoints let you pause, resume, or cancel it without rebuilding the intake. All three return 204 No Content on success and 404 if the generation isn't yours.

POST /v1/generations/{id}/pause

Halts the aggregate at its current state. Returns 409 with an application/problem+json body if the current state doesn't allow pausing — see errors for the conflict codes.

POST /v1/generations/{id}/resume

Transitions a Paused generation back to the state it was in before the pause — a row paused mid-Reviewing returns to Reviewing. The endpoint records the state transition only; it does not re-enqueue work to the orchestrator, so don't assume in-flight LLM work resumes automatically when the call returns. Returns 409 if the generation isn't currently Paused, or — rarely — if no pre-pause state was recorded. See errors.

POST /v1/generations/{id}/cancel

Stops the generation and signals the orchestrator to halt any in-flight LLM calls — useful for cutting cost once you realize a run is going the wrong way. Optional body: {"reason": "..."}. If omitted or empty, the aggregate stamps (no reason given). Returns 409 if the generation is already in a terminal state (Complete, Failed, Cancelled).

None of the three require a request body unless noted above.

Step 6 — retrieve the package

When the generation reaches Complete, a documentation package is ready. Retrieve it:

GET /v1/packages/{id}

The package record includes metadata about what was generated, the review profile used, and links to the package contents. List all packages on your account:

GET /v1/packages?limit=50&offset=0&order=desc

Optional limit (default 50, max 200), offset (default 0), and order (desc newest-first by default, asc). Each row also carries generation_state so you can tell which packages came from clean Complete runs versus partial / failed runs. The list endpoint resolves the project name + description in the same DB round-trip as the package row, so paging the listing doesn't fan into N+1 queries.

Both endpoints carry the same project_name / description / kind / kind_label projection as GET /v1/generations/{id} (added 2026-05-05).

generation_id is null for packages created by migrating existing documentation rather than by a generation run — those packages have no originating generation (Migrate Existing Docs, 2026-05-27). For generated packages it is always present. Filter or branch on null accordingly.

Each row of GET /v1/packages also carries an addenda_count integer — the number of change addenda attached to the package (see step 6.5). 0 when none exist. Saves a per-row follow-up when rendering an "N addenda" annotation.

When you want a package gone — duplicate of a newer iteration, sensitive content, etc. — soft-delete it:

DELETE /v1/packages/{id}

Returns 204 No Content on success and 404 if the package isn't yours. The package row drops out of GET /v1/packages but stays in the database for audit. Idempotent on already-deleted rows. Note: deleting a package does NOT cascade to its parent generation; if you want both gone, delete each independently. To recover a deleted package, see the recycle bin.

To fetch the current package for a generation without going through list and filtering, use:

GET /v1/generations/{id}/package

Returns the same shape as GET /v1/packages/{id} (id, generation_id, version, total_cost_usd, project_name, description, etc.) for the latest package on the generation. 404 while the generation is still in flight or when the package was soft-deleted. Future-proofs for multi-version packages: when a generation produces several package versions, this returns the most recent.

Step 6.2 — read package contents without downloading the zip

For agents that want to inspect a package's structure without fetching the full archive, two endpoints stream the zip's central directory + individual entries from blob storage:

GET /v1/packages/{id}/files

Returns {files: [{path, size_bytes}, ...]} sorted lexicographically by path. The full zip is never materialized on the server — the response reads only the central directory via Azure SDK range requests.

GET /v1/packages/{id}/files/{*path}

Returns the bytes of a single file. The response shape depends on the file type:

Text entries (markdown, YAML, JSON, plain text, CSV, SVG): Content-Type matching the entry, body is the raw text.
Binary entries (PNG, unknown extensions): served with the appropriate binary Content-Type.

Files larger than 256 KB return a 400 directing the caller at the bulk zip endpoint (GET /v1/packages/{id}/zip). Path-traversal segments (..) are rejected at the application layer.

Step 6.3 — full-text search inside a package

GET /v1/packages/{id}/search?q=...&limit=20

Searches the package's indexed file contents (markdown, YAML, JSON, plain text, CSV, SVG entries; binary files are skipped during indexing). Returns {query, results: [{file_path, snippet, rank}, ...]} ranked by relevance, with snippets HTML-highlighted using <mark>...</mark> markers around the match terms.

Query syntax follows Postgres websearch_to_tsquery:

Quoted phrases: "agent topology"
Alternation: auth OR session
Exclusion: auth -test

Case-insensitive; English stemming is applied (searching matches search). An empty query returns an empty result set rather than every row. limit defaults to 20, max 50.

For cross-package search across every non-deleted package the caller owns, use the cross-package variant — useful when you don't already know which package contains what you're looking for:

GET /v1/packages/search?q=...&limit=10

Returns {query, results: [{package_id, project_name, version, total_hit_count, files: [{file_path, snippet, rank}, ...]}, ...]} — matched packages ranked by best per-file score, with up to 5 file hits embedded per package. total_hit_count carries the per-package true count so callers can render "showing N of M" or follow up with the per-package endpoint for a deeper look. limit defaults to 10, max 25.

Indexing happens automatically at package completion; no client action is required to make a new package searchable.

Step 6.5 — file a change addendum

For a focused single-change request against a completed package — "Add Apple ID as an OAuth provider", "Localize French", etc. — file an addendum instead of running a full re-generation. An addendum is one targeted LLM call (~30 seconds, ~$0.40-0.50) that produces a 5-section markdown bundle (background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md) attached to the existing package — no version bump.

For multi-change rewrites that warrant a fresh package version (typically ~$2.50), use POST /v1/generations/{id}/update instead, which runs the full agent pipeline.

POST /v1/packages/{id}/addenda

Body: {title, description}. Both fields are required and non-blank — title ≤ 200 chars, description ≤ 4000 chars. The handler authorizes the caller against the package, downloads the original zip, calls a single LLM, builds the 5-file addendum zip, uploads it, and persists the row. The synchronous call takes ~30 seconds; design clients accordingly.

Returns 200 OK with {addendum_id, download_url, cost_usd}. The download_url is a SAS-tokened blob URL valid for one hour. 400 if title or description is missing/blank. 404 if the package isn't yours.

GET /v1/packages/{id}/addenda

Lists every addendum attached to the package, newest-first. Returns {addenda: [{id, package_id, title, description, cost_usd, submitted_by_user_id, created_at}, ...]}. 404 if the package isn't yours.

GET /v1/packages/{id}/addenda/{addendumId}/zip

302 redirect to a SAS-tokened download URL valid for one hour. Mirrors the per-package zip-download shape. 404 if the package or addendum isn't yours.

Step 6.6 — explain a package for an audience

When you want to share a completed package with someone who isn't going to read the full markdown bundle — an executive, an investor, a new engineer joining the project — ask SpecStep to rewrite it as a short audience-tailored explanation. One audience pick = one cached markdown explanation; repeats for the same audience are free.

GET /v1/explain/audiences

Public catalog of available audiences. Returns {audiences: [{slug, display_name, description}, ...]}. The V1 set is six entries: executive, product-manager, engineering-manager, new-engineer, investor, security. No authentication required.

POST /v1/packages/{id}/explain

Body: {audience} — must match one of the public catalog slugs above. Returns {markdown, audience, model, cost_usd, cached}. The cold call runs one LLM round-trip (~10 seconds, ~$0.05) and persists the result; subsequent calls for the same (package, audience) pair return the cached row with cached: true and cost_usd: 0. 400 EXPLAIN_AUDIENCE_UNKNOWN if the slug isn't in the catalog; 400 MISSING_AUDIENCE if the field is blank; 402 QUOTA_EXPLAIN_EXCEEDED if the monthly explanation quota is reached for your tier; 404 if the package isn't yours. A cold call may return 503 EXPLAIN_TIMEOUT when it exceeds its ~75s wall-clock budget — retry-friendly, no cost incurred, and distinct from a 504 gateway timeout.

GET /v1/packages/{id}/explanations

Lists every cached explanation already generated for the package, newest-first. Returns {explanations: [{id, audience, model, cost_usd, cached, created_at}, ...]}. Useful for showing "already generated" badges in a UI before the user picks an audience. 404 if the package isn't yours.

GET /v1/packages/{id}/explain/download?audience=<slug>&format=<fmt>

Streams a previously-cached explanation as a downloadable file — it does NOT generate, so you must have already run POST /v1/packages/{id}/explain for the same (package, audience) pair. Returns the rendered bytes (200) as an attachment named specstep-explanation-<audience>.<ext>. format must be one of md, txt, pdf, docx — anything else (or omitting it) is 400 UNKNOWN_EXPLANATION_FORMAT; audience must be a slug from the catalog above, else 400 EXPLAIN_AUDIENCE_UNKNOWN. 404 if the package isn't yours, or if no explanation has yet been generated for that (package, audience) pair — generate it first. A safe GET, so a browser <a download> link works with no antiforgery token.

Step 6.7 — migrate existing docs into a package

Have pre-existing documentation that didn't come from a SpecStep generation? Upload a .zip of it and SpecStep classifies each file onto the canonical package layout, then assembles a package you can track development against — no generation run. Both routes take multipart/form-data with the archive in a file field (max 16 MB).

POST /v1/doc-migrations/preview

Dry run: classify the uploaded archive and return the proposed mapping without persisting anything. Returns {source_archive_name, source_byte_count, total_file_count, classified_count, unclassified_count, classifier_version, mapping: [{source_path, doc_type, target_path, layer, confidence}, ...], conflicting_target_paths: [...]}. Each mapping row shows where a source file would land in the normalized package; layer (Manifest / CanonicalTaxonomy / Heuristic / Fallback) + confidence tell you how sure the classifier is. Files that can't be placed map to _source/… (doc_type: "Unclassified"). A non-empty conflicting_target_paths means two files claim the same canonical slot — you must resolve those before committing. 400 for a non-zip, empty, or oversized upload.

POST /v1/doc-migrations/commit

Normalize + persist. Form fields: file (required), project_id (optional — defaults to your default project), version (optional SemVer, default 1.0.0), target_path_overrides (optional JSON object of source-path → target-path corrections from the reviewed preview). Builds the normalized package (canonical layout + _source/ for unclassified files + a specstep.yaml manifest marked source: migrated), stores it, creates the package, and links it to the project. Returns {migration_id, package_id, project_id, version, classified_count, unclassified_count}. The resulting package appears in GET /v1/packages with a null generation_id. 400 for a bad upload; 409 DOC_MIGRATION_UNRESOLVED_CONFLICTS when two sources still claim one canonical slot (supply target_path_overrides to resolve).

Step 7 — deliver the package

Delivery commits the package to the GitHub repository you have configured. You configure the repository in GET /v1/source-control/preferences/installation and PUT /v1/source-control/preferences/installation. Once configured, trigger delivery with:

POST /v1/packages/{id}/deliver

SpecStep commits the package to a new branch and opens a pull request. Your default branch is not touched directly. The response confirms the delivery was queued and includes a reference to the target repository and branch.

Step 7.5 — register webhooks instead of polling

Polling GET /v1/generations/{id} works, but for long-running generations or external automation that can't sit on an open connection, register a webhook subscription on your API key and let SpecStep POST state changes to you instead.

Each subscription belongs to a specific API key (cascade-deleted when the key is revoked) and listens for one or more event types. The signing secret is returned once at create or rotate time — store it; SpecStep never returns it again.

REST vs MCP auth — intentional asymmetry. The REST webhook routes below (create, delete, rotate-secret, test) accept BOTH a cookie session AND an API-key bearer token. Programmatic callers — CI pipelines, server-side scripts — can manage their own webhooks with the same key they use for everything else. The equivalent MCP tools (create_webhook, rotate_webhook_secret, test_webhook) refuse API-key principals; only list_my_webhooks and delete_webhook accept them. The reasoning: an AI agent acting on a leaked or scope-broadened key shouldn't be able to silently redirect or re-sign future event payloads to an attacker-controlled URL. REST callers are explicitly accepting that redirect risk by reaching for the REST endpoint instead of the MCP tool.

POST /v1/api-keys/{apiKeyId}/webhooks

Body:

{
  "url": "https://example.com/specstep-hook",
  "events": ["generation.state_changed", "generation.paused_awaiting_clarification", "generation.completed", "generation.failed"]
}

Returns 201 Created with the subscription record + the plaintext signing_secret. The URL must be HTTPS. Unknown event types return 400.

GET /v1/api-keys/{apiKeyId}/webhooks — list subscriptions for the key. signing_secret is null here; the value is only ever returned at create / rotate.

DELETE /v1/api-keys/{apiKeyId}/webhooks/{webhookId} — remove. Returns 204.

POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/rotate-secret — issue a fresh signing secret and invalidate the old one. Returns the subscription record with the new plaintext signing_secret populated.

Test a webhook subscription

POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/test — fire a synthetic webhook.test event against the configured URL and return the live delivery outcome. No request body. Returns {success, http_status, failure_reason, latency_ms, delivery_id}. 404 if the subscription isn't yours.

The synthetic event uses the webhook.test event type — receivers can branch on it without affecting business state — and travels through the same signing and delivery path as a real event, so a successful test is a valid integration smoke. Same auth contract as create / delete / rotate: owner-initiated, cookie or API key.

Event types

Event type	When it fires
`generation.state_changed`	Every state transition (Queued → Drafting → Reviewing → ...). Most general — subscribe here when in doubt.
`generation.paused_awaiting_clarification`	The generation paused because a creating agent flagged an ambiguity. Pair with step 5.6 to read + answer the question.
`generation.completed`	The terminal `Complete` transition — package is ready.
`generation.failed`	The terminal `Failed` transition — `failure_reason` + `failure_category` are inlined in the body.

Delivery shape

POST <your URL>
Content-Type: application/json
X-SpecStep-Webhook-Signature: sha256=<hex>
X-SpecStep-Webhook-Timestamp: <unix-seconds>
X-SpecStep-Webhook-Event: generation.state_changed
X-SpecStep-Webhook-Delivery: <delivery-uuid>

{
  "event": "generation.state_changed",
  "delivered_at": "2026-05-05T23:00:00Z",
  "generation": { ...same projection as GET /v1/generations/{id}... }
}

The body inlines the same project_name / description / kind / state / timing fields you'd get from GET /v1/generations/{id}. No follow-up GET required.

Verifying signatures

Compute HMAC-SHA256(raw body bytes, your signing secret), hex-encode it lowercase, and compare to the value in X-SpecStep-Webhook-Signature after the sha256= prefix. Use a constant-time comparison. Reject deliveries with a X-SpecStep-Webhook-Timestamp more than 5 minutes old to defeat replays. Use X-SpecStep-Webhook-Delivery as your dedup key — duplicate delivery IDs are safe to discard.

Delivery semantics

Best-effort with bounded retry: 5xx responses + transport failures retry up to 3 times with exponential backoff (1s, 4s, 16s). 4xx responses are treated as terminal — fix the subscriber, then SpecStep will succeed on the next event. Per-subscription state (last delivery time, last status, last HTTP code) is exposed on the GET response so you can see whether a webhook is healthy without instrumenting your own receiver.

No persistent queue, no DLQ for v1 — if every retry fails, the event is logged on the server side and dropped. Build receivers that can tolerate occasional missed events; the canonical state is always GET /v1/generations/{id}.

Step 7.7 — recover deleted interviews, generations, and packages

Soft-deletes are reversible. Every soft-deleted row stays in the database; you can list and restore them on demand. The web UI surfaces this as "Settings → Recycle Bin"; the equivalent REST endpoints are below.

List your own deleted rows

GET /v1/interviews/deleted?limit=20&offset=0

GET /v1/generations/deleted?status=...&limit=50&offset=0&order=desc

GET /v1/packages/deleted?limit=50&offset=0&order=desc

Each returns the same row shape as its live counterpart. Standard pagination (limit, offset) plus order for the generations + packages variants. Caller-scoped — you only see your own rows. Anonymous → 401.

Restore a deleted row

POST /v1/interviews/{id}/restore

POST /v1/generations/{id}/restore

POST /v1/packages/{id}/restore

All three return 204 No Content on success and 404 if the row isn't yours or doesn't exist. No state guard on restore — even if the generation was Failed or Cancelled at delete time, restore returns it to your workspace in the same state. Idempotent on already-live rows. Each restore writes an audit row.

Restoration does NOT cascade. Restoring an interview does not auto-restore generations or packages produced from it; if you want all three back, restore each independently.

Delete forever (permanent removal)

Added 2026-05-14.

The Recycle Bin's "Delete forever" affordance maps to three hard-delete endpoints. Only soft-deleted rows can be hard-deleted — the endpoints return 409 Conflict on a live row.

DELETE /v1/interviews/{id}/permanent

DELETE /v1/generations/{id}/permanent

DELETE /v1/packages/{id}/permanent

All three return 204 No Content on success, 409 when the row isn't soft-deleted yet, and 404 when the row doesn't exist or isn't yours. The dependent rows (intake artifacts, generation events, package addenda, blob payloads) cascade automatically via the existing EF foreign-key configuration. There is no restore after a hard-delete.

Step 8 — public status endpoints

These endpoints are anonymous — no API key required, no authentication header. They are intentionally reachable when the rest of the API is degraded, so you can wire them into health checks and status pages without worrying about auth-path outages.

GET /v1/status/summary

Returns {overall, services, active_incidents, generated_at} — the current overall status, the per-service breakdown, any active incidents, and when the snapshot was produced.

GET /v1/status/uptime?days=30

Uptime report over the trailing window. days defaults to 30.

GET /v1/status/history?limit=50

Recent incidents, newest first. limit defaults to 50.

POST /v1/status/subscribe

Body {"email": "..."} subscribes the address to status updates. Returns 200 whether or not the email already exists — the endpoint doesn't enumerate subscribers — and the user-facing message is always "Check your email for a confirmation link." Invalid emails return 400; the per-IP throttle is 5 requests per 15 minutes and returns 429 past that.

Other useful endpoints

Account-level reads, BYO-provider key management, notification controls, bug reports, and schema retrieval — all callable with a bearer token, none required for the generation flow.

Account & usage

Endpoint	Purpose
`GET /v1/usage`	Current quota usage for your account (generations used, remaining, reset date).
`GET /v1/me/analytics`	Personal usage analytics: generation counts, average duration, review-profile distribution.
`GET /v1/me/provider-keys`	List any BYO LLM provider keys (Anthropic, OpenAI) you have registered.
`PATCH /v1/me/provider-keys/{provider}`	Upsert a provider key. Body: `{"secret": "..."}`. Requires the Developer role.
`DELETE /v1/me/provider-keys/{provider}`	Remove a provider key.

Session state — build sessions, decisions & token usage

Added 2026-06-05.

These power the SpecStep session-state kit — the start-session / end-session skills plus the SessionEnd usage reporter that let an AI coder (Claude Code and similar) track what it does while building your project: build sessions, a decision log, a backlog, and per-session token usage. Self-service — every authenticated user reads and writes their own project's session state with a key scoped to session_state.read / session_state.write (plus projects.read / projects.write for the project itself); no special role is required. The tenant filter confines every read and write to your own (and your organization's) projects. The same operations are available as MCP tools — start_build_session, end_build_session, append_decision_log, file_backlog_item, record_build_session_usage, get_build_session_cross_aggregate, and the matching query_* / list_* reads.

Endpoint	Purpose
`POST /v1/build-sessions`	Start a build session. Body: `{computer, branch, intent, opened_by_client_type, project_id?}`. Returns `{id, status: "Active", …}`. Idempotent on `(computer, branch, you)` — a retry returns the existing Active session instead of duplicating.
`PATCH /v1/build-sessions/{id}/current-state`	Update the rolling current-state markdown.
`POST /v1/build-sessions/{id}/end`	Close a session. Body: `{session_history_entry_markdown, related_pr_urls?, related_commit_shas?}`.
`POST /v1/build-sessions/{id}/usage`	Record the token usage one AI-coder session contributed — the reporter's endpoint. Detailed below.
`GET /v1/build-sessions/{id}/cross-aggregate`	Read a session plus its linked decision-log entries, backlog items, the token `usage_rollup` (summed), and the per-AI-coder-session `usages`.
`GET /v1/projects/{id}/metrics`	Project metrics, including a `usage` rollup — the project's "cost to build" as total tokens across all its build sessions.
`GET /v1/projects/{id}/activity`	Recent project activity — a single reverse-chronological feed that merges build sessions started and closed, decisions logged, backlog items filed and resolved, build lessons filed, and rules deployed. Optional `limit` query (default 20, max 50). Returns `{items: [{kind, occurred_at, title, ref_id, actor_id?}]}` where `kind` is one of `session_started`, `session_closed`, `decision_logged`, `backlog_filed`, `backlog_resolved`, `lesson_filed`, `rule_deployed` (`actor_id` is null for `rule_deployed` — rules are system-emitted). Lessons and rules are confined to the caller's own and organization scope.
`GET /v1/projects/{id}/contributors`	Recent contributors — the distinct people who have logged work on the project (build sessions opened, decisions authored, backlog items filed), most active first. This is an authorship roll-up, not a membership or access list. Optional `limit` query (default 8, max 50). Returns `{items: [{actor_id, display_name, activity_count, last_active_at}]}`.
`GET /v1/projects/{id}/lessons-rules-summary`	Build lessons & rules summary — a compact read-only view of the project's own build lessons and rules: total and enforced lesson counts, the active-rule count, and the most-recently-touched lessons. Optional `recent_limit` query (default 5, max 25). Returns `{lesson_count, enforced_lesson_count, active_rule_count, recent_lessons: [{id, slug, title, status, last_observed_at}]}` where `status` is one of `Observed`, `Documented`, `Enforced`, `Archived`. Confined to the caller's own and organization scope.

Decision-log entries and backlog items have the same self-service shape under /v1/decision-log and /v1/backlog.

Record token usage

POST /v1/build-sessions/{id}/usage is an idempotent upsert keyed on claude_session_id: re-posting the same id overwrites that row's counts, so a resumed or re-reported AI-coder session never double-counts. Tokens only — the four billable input classes are stored separately because cache-read tokens dominate by orders of magnitude, so a single collapsed total would be meaningless. Requires session_state.write.

Body field
`claude_session_id` (required)	The AI-coder session id this row aggregates — the idempotency key.
`input_tokens`, `cache_write_5m_tokens`, `cache_write_1h_tokens`, `cache_read_tokens`, `output_tokens`	The five token counts. All default to `0`, so a minimal report is `{claude_session_id, cache_read_tokens, output_tokens, turns}`.
`turns`, `sidechain_turns`	Assistant turn counts; `sidechain_turns` must be ≤ `turns`.
`agent`, `models`, `window_start`, `window_end`, `reporter_version` (optional)	The reporting agent (default `claude-code`), model id(s), the metered window, and the reporter's version.

Returns 200 with the stored row, including the derived billable_input_total (the four input classes summed) and total_tokens (billable input plus output). 404 if the build session id is unknown.

The reference reporter that produces this payload ships in the session-state kit as session-end-usage-reporter.py — a Claude Code SessionEnd hook that reads the local transcript, sums the usage, and posts it here. Set SPECSTEP_API_KEY (scoped to session_state.write) in your environment for it to record. Install the kit from SpecStep's public marketplace — see Hooking it up. Because the upsert is keyed on claude_session_id and accepted on closed sessions, you can backfill a past session's usage at any time — the reporter's --backfill <transcript> --build-session <id> mode does it in one step, or just re-POST here (overwriting that row). See Backfilling past sessions.

Notifications

Endpoint	Purpose
`GET /v1/notifications`	List your notifications.
`GET /v1/notification-preferences`	Retrieve your notification preferences (email, SMS).
`PUT /v1/notification-preferences`	Update notification preferences.

Retention preferences

Added 2026-05-14.

Per-user default-retention deadline applied to new packages. The Web UI surfaces this at Settings → Privacy & retention.

Endpoint	Purpose
`GET /v1/users/me/retention-preference`	Returns `{default_retention_days}`. `null` means "indefinite" (the platform default).
`PUT /v1/users/me/retention-preference`	Body: `{"default_retention_days": <int> \\| null}`. Range 1–3650; `null` clears the override. Returns `200` with the persisted snapshot.

This preference applies only to NEW packages — existing rows aren't backfilled. To change retention on a specific package, see update_package (MCP) or the equivalent REST mutation.

Added 2026-05-15.

Self-serve export of your account data — interviews, generations, packages, audit log, notification + retention preferences. The job runs asynchronously; the response is 202 Accepted with the request id. When the export completes, the executor uploads a zip to blob storage and you receive an email with a signed download URL valid for 7 days.

Endpoint	Purpose
`POST /v1/users/me/data-export-request`	Records the request. Returns `202 Accepted` with `{request_id, status: "queued"}`. Cookie-only — bearer-token callers get `403`.
`GET /v1/users/me/data-export-request`	Returns the latest request's snapshot — `{request_id, status: "queued"\\|"processing"\\|"completed"\\|"failed", requested_at, download_url?, download_url_expires_at?}`. Poll until `status` reaches a terminal value.

The audit log records data_export.requested on submission and data_export.completed on success.

Added 2026-05-14.

Self-serve account deletion. The job runs asynchronously; ~30 seconds after submission a worker hard-deletes the user's interviews, generations, packages, API keys, OAuth tokens, external connectors, preferences, and the user row itself. Audit events are anonymized (actor_id replaced with the tombstone Guid.Empty) but retained per the 13-month audit-retention window. The user receives a confirmation email; the support inbox receives an operational copy.

Endpoint	Purpose
`POST /v1/users/me/deletion-request`	Records the deletion request. Returns `202 Accepted` with `{request_id, status: "queued"}`. Cookie-only — bearer-token callers get `403`.
`GET /v1/users/me/deletion-request`	Polls the request's status. Returns `{request_id, status: "queued"\\|"processing"\\|"completed"\\|"failed", requested_at}`. Once `status = "completed"` the response is the last thing you'll get back — the next request returns `401` because the user is gone.

The audit log records account.deletion_requested on submission and account.deletion_completed once the cascade succeeds.

Extra Usage prepaid balance

Added 2026-05-14.

Top up a prepaid balance that absorbs overage when you run out of monthly credits. Web UI: Settings → Plan → Extra Usage.

Endpoint	Purpose
`GET /v1/me/extra-usage`	Returns the user's balance + enabled flag + last-topup timestamp.
`POST /v1/me/extra-usage/enable`	Enables Extra Usage (off by default; overage hits the standard 403-quota-exceeded path until enabled).
`POST /v1/me/extra-usage/disable`	Disables Extra Usage. Existing balance is preserved but not consumed.
`POST /v1/me/extra-usage/checkout`	Creates a Stripe Checkout session for a buy-block top-up. Body: `{"amount_usd": <int>}`. Returns the checkout URL.
`GET /v1/me/extra-usage/transactions?limit=50&offset=0`	Lists transactions (top-ups + debits) newest-first.

Cookie-only — Extra Usage purchases are user-actioned, not API-key-driven.

Organizations

Added 2026-05-26.

Groups a team under one account — membership is optional, and the rest of the API behaves the same either way. Requires the Teams plan; the creator becomes the primary contact and first member. Web UI: Settings → Profile → Organization.

Endpoint	Purpose
`GET /v1/me/organization`	Returns your organization — `{id, name, primary_contact_user_id, address_line1, address_line2, city, region, postal_code, country, phone_number, member_count, created_at, updated_at}`. Returns `204 No Content` when you don't belong to one.
`POST /v1/me/organization`	Create an organization with yourself as the primary contact + first member. Body: `{"name": "Acme Co", "address_line1"?, "address_line2"?, "city"?, "region"?, "postal_code"?, "country"?, "phone_number"?}` — only `name` is required. Returns `201 Created` with the organization.

POST /v1/me/organization is gated on your subscription tier and current membership; both failures return an RFC 7807 problem with a code extension:

403, code: TEAMS_TIER_REQUIRED — creating an organization requires the Teams plan. Upgrade, then retry.
409, code: ALREADY_HAS_ORGANIZATION — you already belong to an organization. A user can belong to at most one; leave it before creating another.

API-key scoping & rotation

The canonical key-lifecycle docs live in authentication; the scoping + rotation notes belong here:

POST /v1/api-keys accepts an optional scopes: [...] array of permission codes — fetch the catalog from GET /v1/permissions. Pass null or omit the field for legacy unscoped behavior. Unknown codes return 400.
POST /v1/api-keys also accepts an optional project_id (UUID) to scope the key to a single project — omit or null to let the key access all of your projects. The project must be one you own or one in your organization, otherwise 400. If you belong to an organization, the key is automatically bound to that organization. The create response (and GET /v1/api-keys summaries) echo project_id + organization_id.
PATCH /v1/api-keys/{id}/scopes rotates the scope set on an existing key. Body: {scopes: [...] | null}. Returns 204 / 404 / 400.
POST /v1/api-keys/{id}/rotate rotates the key's secret in place. No request body. Returns 200 with a fresh raw_key (shown once — copy it immediately) while preserving the key's identity, scopes, and project/org binding; the old secret is invalidated immediately. 404 if the key isn't yours or is revoked. GET /v1/api-keys summaries expose last_rotated_at.

These endpoints are forbidden to API-key callers — only cookie- or OIDC-authenticated humans can mint, rotate, or re-scope keys, so a leaked key cannot mint a replacement, rotate its own secret, or escalate its own scopes.

Bug reports

Anyone authenticated can submit and read back their own bug reports.

POST /v1/bug-reports — submit. Body: {title, description, severity?, related_generation_id?, current_route?, user_agent?, client_type?}. severity is one of low, medium, high, critical (default medium); client_type is browser, api, or mcp (REST callers default to api; MCP and the browser form stamp their own value server-side, so callers can't spoof). The server enriches every submission with the caller's account name / email / plan, the build version, and a heuristic AI-tool tag derived from the User-Agent (Claude Code / Codex / Copilot / etc.). Returns 201 Created with the full record.

GET /v1/bug-reports/me?limit=20 — your own reports, newest first.

GET /v1/bug-reports/{id} — a single report. Open to the submitter. Foreign callers get 404 so report ids can't be probed.

Quality feedback

Distinct from bug reports — feedback evaluates quality (was the interview good, is the package coherent, what's the build confidence). Bug reports are for broken behavior.

Anyone authenticated can submit and read back their own feedback. The two template-catalog reads are anonymous-OK; the templates are public-safe content the client needs to fill the rubric.

POST /v1/feedback — submit. Body: {type, title, full_report, summary?, severity?, client_type?, interview_id?, intake_artifact_id?, generation_id?, package_id?, interview_quality_score?, package_quality_score?, build_confidence_percent?, letter_grade?, structured_findings?, template_id?, rubric_version?, rubric_section_responses?, rubric_scores?, tags?, estimated_output_quality?, project_type?, review_profile?, transcript_evidence?, package_evidence?}. type is one of interview_quality, package_quality, end_to_end_run, tooling_experience, api_doc_quality, website_quality, launch_readiness, other; severity is info, low, medium, high, critical (default medium); client_type is browser, api, or mcp (REST callers default to api; MCP stamps mcp server-side, so callers can't spoof). Run-bound types (interview_quality, package_quality, end_to_end_run) require at least one target GUID; tooling_experience, api_doc_quality, website_quality, launch_readiness, and other may submit without one. The server enriches every submission with the caller's account name, email, plan, build version, and a heuristic AI-tool tag. Returns 201 Created with the full record.

Optional submitter context: estimated_output_quality is a short qualitative label (≤50 chars) — distinct from the numeric build_confidence_percent and the single-letter letter_grade. project_type and review_profile (≤50 chars each) denormalize the project type and review profile at submission time so the triage queue can filter on them even if the underlying interview is regenerated with different settings. transcript_evidence and package_evidence are optional arrays of quoted snippets (each ≤2000 chars) backing the findings — surface them when an LLM-class submitter (Codex, Claude Code) can quote the source material directly.

Each entry in structured_findings carries {severity, topic, title} plus three optional fields each capped at 2000 chars: evidence (quoted text supporting the finding), expected_behavior (what the caller expected), and suggested_fix (caller's proposed remediation). Mirrors the specialist-reviewer finding shape so feedback findings + reviewer findings aggregate together.

Typed evidence (added 2026-05-21): each finding also accepts an optional typed_evidence array (up to 20 items) for machine-readable signal that would otherwise be flattened into prose. Each item is {kind, payload_json} where kind is one of free, http_response, route, console_error, mcp_tool_call, transcript_turn, screenshot, json_payload and payload_json is a well-formed JSON document (≤4000 chars). Required keys depend on the kind: http_response needs a numeric status; route a string url; console_error a string message; mcp_tool_call a string tool; transcript_turn a numeric turnIndex; screenshot a string path; free and json_payload accept any well-formed JSON. Prose evidence and typed_evidence coexist; read responses echo typed_evidence in the same shape.

GET /v1/feedback/me?limit=20 — your own feedback, newest first. Returns a slim list shape (added 2026-05-21): each row carries the scalars, scores, a 200-char summary_excerpt, and counts (tag_count, finding_count, transcript_evidence_count, package_evidence_count) instead of the full bodies — fetch the complete record (full_report, structured_findings, evidence arrays) from GET /v1/feedback/{id}.

GET /v1/feedback/{id} — a single feedback row. Open to the submitter. Foreign callers get 404 so feedback ids can't be probed.

PATCH /v1/feedback/{id}/amend — submitter self-correction (added 2026-05-21). While your row is still Open AND within the amend window (10 minutes of submission), you can fix free-form content: {title?, summary?, full_report?, transcript_evidence?, package_evidence?, tags?}. Omitted fields are left unchanged. Identity-defining fields (type, severity, target GUIDs, template_id/rubric_version) and structured_findings are NOT amendable. Returns 200 with the updated record. 404 if the row isn't yours (existence isn't leaked); 400 FEEDBACK_AMEND_NOT_OPEN once the row has left Open (review has started); 400 FEEDBACK_AMEND_WINDOW_EXPIRED after the window. After that, the row is locked for self-correction.

GET /v1/feedback/templates — anonymous-OK. Lists the code-defined rubric templates that ship with the platform.

GET /v1/feedback/templates/{id}/{version} — anonymous-OK. Returns the full sections array (each with id, title, prompt, and optional score_scale). Fill rubric_section_responses keyed by section id and rubric_scores for sections that have a non-null score_scale.

Seven built-in templates ship in v1. Pick the one whose scope matches the feedback — narrower rubrics keep the signal cleaner than the all-in-one.

Template id	Pairs with `type`	Scope
`end-to-end-specstep-quality` v1.0.0	`end_to_end_run`	One full SpecStep run (interview through generated package). 13 sections covering interview quality, package coherence, build confidence, letter grade, top blockers, recommended fixes.
`interview-quality` v1.0.0	`interview_quality`	Otto's performance during a single Interview. 7 sections covering pacing, follow-up quality, coverage breadth, rapport, gaps.
`package-buildability` v1.0.0	`package_quality`	Whether a generated package is buildable as-is by an AI coder. 8 sections covering coherence, completeness, AI-coder clarity, edge cases, effort-estimate accuracy, top risks.
`api-doc-quality` v1.0.0	`api_doc_quality`	The public `/api-docs/*` surface. 8 sections covering endpoint coverage, completeness, example clarity, error handling, schema clarity, missing sections, recommended improvements.
`tooling-experience` v1.0.0	`tooling_experience`	SpecStep's tooling surfaces (MCP, CLI, IDE integration). 9 sections covering ergonomics, integration, error-message clarity, performance, friction points.
`website-quality` v1.0.0	`website_quality`	The public marketing/docs site at specstep.com. 11 sections covering visual polish, copy quality, SEO + sitemap correctness, route correctness, mobile experience, console cleanliness, content sanitization.
`launch-readiness` v1.0.0	`launch_readiness`	Cross-cutting pre-launch review. 12 sections covering Priority-0 blockers, public content sanitization, trust posture, API + MCP stability, mobile readiness, accessibility, performance, observability, and a final go / no-go recommendation.

Support tickets

POST /v1/support/ticket — submit a support ticket through the in-app channel.

Schema retrieval

Endpoint	Purpose
`GET /v1/schema/package/{version}`	JSON schema for the package format at a specific version.
`GET /v1/schema/intake/{version}`	JSON schema for the intake artifact format.

Both return application/schema+json on success and 404 when the version isn't recognized. Useful for validating generated content programmatically.

Miscellaneous mutations

PATCH /v1/generations/{id}/name — set the display name for a generation. A request body is required; the name field can be a string, or null / empty / whitespace to clear the override and fall back to the intake-derived name. Sending no body at all returns 400. Returns 204 No Content on success; 404 if not yours.

Step 1 — verify your credentials

Step 2 — start an interview

Step 3 — submit interview turns

Step 3.5 — discover the enumerable inputs (optional)

Step 3.6 — connect an external folder (optional)

Step 3.7 — list intake artifacts (optional)

Step 4 — start a generation

Step 5 — poll generation status

Generation states

Generation response fields

Failure categories

Step 5.5 — retry a failed generation

Step 5.6 — respond to a mid-generation clarification

Step 5.7 — control a running generation

Step 6 — retrieve the package

Step 6.2 — read package contents without downloading the zip

Step 6.3 — full-text search inside a package

Step 6.5 — file a change addendum

Step 6.6 — explain a package for an audience

Step 6.7 — migrate existing docs into a package

Step 7 — deliver the package

Step 7.5 — register webhooks instead of polling

Test a webhook subscription

Event types

Delivery shape

Verifying signatures

Delivery semantics

Step 7.7 — recover deleted interviews, generations, and packages

List your own deleted rows

Restore a deleted row

Delete forever (permanent removal)

Step 8 — public status endpoints

Other useful endpoints

Account & usage

Session state — build sessions, decisions & token usage

Record token usage

Notifications

Retention preferences

Data export (GDPR data portability)

Account deletion (GDPR right-to-erasure)

Extra Usage prepaid balance

Organizations

API-key scoping & rotation

Bug reports

Quality feedback

Support tickets

Schema retrieval

Miscellaneous mutations