Skip to content
Loading SpecStep…
On this page

MCP server

Updated 2026-05-30

If your client speaks MCP natively (Claude Code, Claude Desktop, IDE extensions), skip to Connecting an MCP client — the client handles the protocol for you. The Manual JSON-RPC walkthrough below is for anyone implementing an MCP client by hand or adapting a custom agent runtime.

SpecStep's MCP (Model Context Protocol) server exposes the same generation engine as the REST API, but shaped as discrete tools your AI coding agent can call directly. If your agent is already MCP-capable — Claude Code, Claude Desktop, or a compatible IDE — you can point it at the SpecStep MCP server and let it request documentation without hand-crafting HTTP.

What MCP is

MCP is a protocol for connecting AI agents to external tools and data sources. It uses JSON-RPC messages over HTTP. The agent calls initialize to discover what tools are available, then invokes tools by name with structured arguments. The server returns structured results the agent can read and reason over.

SpecStep implements MCP over a single HTTP endpoint. There is no WebSocket or streaming transport — each tool call is a POST with a JSON-RPC envelope, and the response is returned in the same HTTP response.

Authentication

SpecStep supports two ways to authenticate MCP calls. Browser-based sign-in is the recommended default — your MCP client opens a browser, you sign in once, and the client receives a token without any key management on your part.

The MCP server advertises OAuth 2.1 with PKCE per the MCP spec. Compatible clients — Claude Desktop, Claude.ai, Cursor, Codex, GitHub Copilot, Continue, Cline, and any client that implements the MCP authorization extension — trigger the flow automatically:

  1. The first unauthenticated call to /mcp returns 401 Unauthorized with a WWW-Authenticate header pointing at the protected-resource metadata document.
  2. The client fetches the discovery document at /.well-known/oauth-protected-resource (and /.well-known/oauth-authorization-server) to learn the authorize and token endpoints.
  3. The client opens https://specstep.com/oauth/authorize?… in your browser.
  4. You sign in to SpecStep (via the existing Entra account) and click Allow on the consent screen.
  5. The browser 302s to a loopback URL the MCP client is listening on, carrying a one-time authorization code.
  6. The client exchanges the code at /oauth/token (PKCE-verified) and receives a Bearer oat_… access token valid for 90 days.

You can review and revoke browser-based sign-ins from Settings → API keys → Connected MCP clients.

Dynamic Client Registration (RFC 7591)

Added 2026-05-15.

The discovery document at /.well-known/oauth-authorization-server advertises a registration_endpoint of https://specstep.com/oauth/register. Any MCP client that speaks RFC 7591 — Codex, Claude Desktop, Cursor, Continue, Cline, and any other client following the MCP authorization extension — registers itself on first connect without any pre-shared client_id:

  1. The client POSTs its metadata to /oauth/register:
    {
      "client_name": "Codex",
      "redirect_uris": ["http://127.0.0.1:54321/callback"]
    }
    
  2. The server validates each redirect_uri against the RFC 8252 loopback allowlist (http://127.0.0.1:<port>/… or http://localhost:<port>/…), mints a fresh client_id of the shape mcp_<32-hex>, and returns the RFC 7591 §3.2.1 envelope:
    {
      "client_id": "mcp_e0f4261b3ad3b5e8dd3ae4c5327a6fec",
      "client_name": "Codex",
      "redirect_uris": ["http://127.0.0.1:54321/callback"],
      "grant_types": ["authorization_code"],
      "response_types": ["code"],
      "token_endpoint_auth_method": "none",
      "client_id_issued_at": 1715800000
    }
    
  3. The client uses that client_id for the subsequent /oauth/authorize + /oauth/token handshake described above.

Registration is anonymous (no API key, no cookie) and rate-limited to 30 registrations per IP per hour. The legacy hardcoded client_id specstep-mcp-generic is still accepted for pre-RFC-7591 clients; new integrations should register their own.

Only the loopback redirect-URI shape is allowed. Public HTTPS redirects, non-HTTP schemes, host-substring tricks, and userinfo-form URIs are rejected with error: "invalid_redirect_uri". Only grant_type=authorization_code, response_type=code, and token_endpoint_auth_method=none (public clients with PKCE) are accepted in the registration request; anything else returns error: "invalid_client_metadata".

API key (for CI / automation)

For headless or server-to-server flows where no browser is available, the existing API-key scheme works:

POST https://specstep.com/mcp
Content-Type: application/json
Authorization: Bearer sf_xxxxxxxxxxxx

Create one at Settings → API keys. The same rate limits apply to both auth schemes — API-key callers have an independent per-key counter; OAuth callers share a single per-user counter across all connected clients. See rate limits for the full scoping rules.

A key's scopes govern which tools it can reach. Most tools below work with any authenticated key, but the session-state and project tools — build sessions, the decision log, the backlog, and project management — are opt-in: a key sees them in tools/list only when it carries the matching scopes (session_state.read, session_state.write, projects.read, projects.write), and a project-scoped key is confined to its one project. See Session state and project tools for the scope reference and how to mint a project-scoped key.

Transport

All MCP traffic goes to:

POST https://specstep.com/mcp
Content-Type: application/json
Authorization: Bearer <oat_…  or  sf_…>

The body is a JSON-RPC 2.0 object. The server returns JSON-RPC results or errors.

A minimal tool call looks like:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "get_generation",
    "arguments": { "generation_id": "gen_01hx..." }
  }
}

Most MCP clients handle the JSON-RPC envelope for you. You configure the server URL; the client either negotiates OAuth automatically or, if you've supplied an API key, attaches the bearer.

Manual JSON-RPC walkthrough

This section shows the exact wire shape for clients written by hand — no MCP library. Every example below is a single POST https://specstep.com/mcp with Authorization: Bearer sf_… (or oat_… from the OAuth flow) and Content-Type: application/json. The server returns the JSON-RPC response in the same HTTP response.

1. initialize

The handshake. The client announces its protocol version + capabilities; the server replies with its identity and what it supports.

Request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-03-26",
    "capabilities": {},
    "clientInfo": { "name": "my-agent", "version": "0.1.0" }
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2025-03-26",
    "capabilities": {
      "tools": { "listChanged": false }
    },
    "serverInfo": {
      "name": "specstep",
      "version": "0.1.0"
    }
  }
}

protocolVersion is the MCP spec version SpecStep speaks; pin your client to it or treat anything matching 2025-* as compatible. capabilities.tools.listChanged: false means the server does not push tool-list updates — refetch tools/list explicitly if you suspect the manifest changed.

2. notifications/initialized

Per the MCP spec, the client follows up with a one-way notification (no id field, no expected response). SpecStep treats initialize as the only required handshake and tolerates clients that skip the notification, but well-behaved clients send it:

{ "jsonrpc": "2.0", "method": "notifications/initialized" }

3. tools/list

Discover the tool catalog.

Request:

{ "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {} }

Response (truncated — see Available tools below for the complete list):

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [
      {
        "name": "start_interview",
        "description": "Starts a new interview. Returns the interview id and initial agent turn.",
        "inputSchema": {
          "type": "object",
          "properties": {},
          "additionalProperties": false
        }
      },
      {
        "name": "submit_interview_turn",
        "description": "Submits a user turn to an interview. Returns the agent's reply and updated state.",
        "inputSchema": {
          "type": "object",
          "properties": {
            "interview_id": { "type": "string", "format": "uuid" },
            "message":      { "type": "string", "minLength": 1 }
          },
          "required": ["interview_id", "message"],
          "additionalProperties": false
        }
      }
    ]
  }
}

Each entry has name, description, and a JSON Schema inputSchema. The schema is what your agent should hand to its LLM as the tool signature — names and types are authoritative.

4. tools/call

Invoke a tool. Tool results are wrapped in MCP content blocks; v1 always emits a single text block carrying the tool's JSON payload as a string. Parse it on the client.

Request:

{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "start_interview",
    "arguments": {}
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "{\"id\":\"01952fcb-cd11-7c3e-9a2e-3b1d8f5e6a04\",\"status\":\"active\",\"transcript\":[{\"role\":\"agent\",\"content\":\"Tell me what you're building...\"}]}"
      }
    ]
  }
}

The result.content[0].text field is a JSON string — parse it again on your side to get the structured payload (interview id, status, transcript, etc.). MCP errors come back as standard JSON-RPC error envelopes (result absent, error: {code, message}); typed application errors (quota exceeded, ownership conflicts, paused-state guards) prefix the message with a stable error code like QUOTA_EXCEEDED: ... or RETRY_STATE_INVALID: ... so clients can branch on it.

Connecting an MCP client

The exact configuration depends on your client. There are two shapes — pick the one that matches whether your client supports OAuth.

OAuth-capable clients (recommended) — Claude Desktop, Claude.ai, Cursor, Codex, GitHub Copilot, Continue, Cline, and any client that implements the MCP authorization extension. Configure the server URL only; the client handles the browser sign-in flow on first connect:

{
  "mcpServers": {
    "specstep": {
      "url": "https://specstep.com/mcp"
    }
  }
}

On the first tool call, the client opens a browser to SpecStep, you sign in and click Allow, and the client receives a 90-day access token. You can revoke it from Settings → API keys → Connected MCP clients.

API-key clients — for clients without OAuth support, or headless / CI flows where no browser is available:

{
  "mcpServers": {
    "specstep": {
      "url": "https://specstep.com/mcp",
      "headers": {
        "Authorization": "Bearer ${SPECSTEP_API_KEY}"
      }
    }
  }
}

Replace ${SPECSTEP_API_KEY} with a key minted from Settings → API keys.

After connecting, call initialize (or the equivalent in your client) to retrieve the tool manifest. The server returns the list of available tools with their argument schemas.

End-to-end flow via MCP

The same steps as the REST walkthrough, expressed as tool calls. An MCP-capable agent can drive this entire sequence autonomously.

1. Start an interview. Call start_interview with a description of what you're building. Store the returned interview_id.

2. Submit turns. Call submit_interview_turn with the interview_id and your first message. Read the agent's response. Continue submitting turns — answering the AI Team's questions about vision, users, requirements, constraints, and architecture — until the returned interview state is completed. A typical interview takes five to fifteen turns.

3. Start a generation. Call start_generation with the intake_id (the intake artifact identifier produced by completing the interview) and your chosen review profile. Store the returned generation_id. The generation is now Queued.

4. Poll for completion. Call wait_for_generation with the generation_id and respect the returned next_check_seconds hint between calls. The state will move from Queued to Drafting / Reviewing / FreshEyes as the generation runs, then to a terminal Complete / Failed / Cancelled. wait_for_generation is preferred over get_generation because it inlines the polling cadence + the pending-clarifications + the download URL, cutting most flows to a single tool call.

4a. Handle a paused clarification. If wait_for_generation returns state: "PausedAwaitingClarification", the response already includes pending_clarifications (get_pending_clarifications would return the same payload, so no extra round-trip needed). Surface the question text to your user, gather their answers, then call answer_clarifications with {question, answer} pairs that match the question text verbatim. The generation resumes on the next dispatcher tick. Skip this step entirely when no clarification fires.

5. Retrieve the package. When wait_for_generation reports state: "Complete", the response carries a short-lived package_url you can download from directly. If you want richer metadata, call get_package with the package_id; for history, list_packages.

6. Deliver (optional). Package delivery — committing to a GitHub repository and opening a pull request — is handled via the REST API (POST /v1/packages/{id}/deliver). MCP tools do not cover delivery in this version.

Recommended MCP workflows

Twelve short recipes covering the common reasons an agent calls SpecStep. Each names the tools in order — argument schemas live in the reference catalog below.

1. Create a new package from scratch

  1. start_interview — opens the interview, returns interview_id.
  2. submit_interview_turn — submit user messages until the interview reports completed.
  3. validate_generation_request — recommended pre-flight; returns {is_valid, blocking_errors, warnings} without enqueueing.
  4. start_generation — kick off the run. Returns generation_id.
  5. wait_for_generation — block on terminal state with built-in polling cadence.
  6. get_latest_package_for_generation — resolve the produced package.
  7. list_package_files / get_package_file — read individual files on demand.

2. Inspect a completed package

  1. list_packages (account-wide) or list_packages_for_generation (one generation).
  2. get_package — the package record + a fresh SAS download URL.
  3. list_package_files — the zip's central-directory listing.
  4. get_package_file — read individual files without downloading the zip.
  5. search_package — full-text search inside a single package.

3. Compare two or more generations

  1. compare_packages — high-level identity verdict + per-package build / quality confidence scores.
  2. diff_package_files — line-level unified diff across same-named files.
  3. get_generation_quality_report — structured reliability / accessibility / cost / risk findings per generation.
  4. get_security_findings — structured security-expert findings per generation.

4. Apply a small change to an existing package

  1. estimate_change_request_cost — check the rolling-30-day median cost before paying for the call.
  2. request_change — file the addendum (one LLM call; cheaper than a full re-gen).
  3. list_change_requests — the addendum history for a package.
  4. get_change_request — one addendum record + SAS download URL for the zip.
  5. list_change_request_files + get_change_request_file — read the addendum's five markdown files without unzipping.

5. Gate automation on quality and security

  1. wait_for_generation until state == "Complete".
  2. get_security_findings — branch on max_severity (Critical / Major / Minor / Info / None).
  3. get_generation_quality_report — reliability / accessibility / cost / risk severities for the same generation.
  4. Fail or warn based on the severity thresholds your gate enforces.

6. Attach external reference docs

  1. attach_external_folder — returns a one-time browser URL the user opens to complete OAuth + folder pick.
  2. User opens the URL in their browser; SpecStep handles provider OAuth and first sync server-side.
  3. get_attach_external_folder_session — poll until status == "Completed" (or a terminal failure).
  4. Continue the interview or generation flow; the folder's files are now available as reference documents.

7. Use webhooks instead of polling

  1. create_webhook — subscribe a target URL to one or more event types. The signing secret is returned once.
  2. test_webhook — fire a synthetic webhook.test event to verify the target is reachable.
  3. rotate_webhook_secret — issue a fresh signing secret and invalidate the old one.
  4. delete_webhook — retire the subscription.

wait_for_generation remains the canonical polling fallback when the webhook target is unavailable.

8. Inspect or resume an in-flight generation

  1. list_generations filtered by state — find your in-flight runs.
  2. get_generation — the full aggregate including progress_percent and cost-forecast fields.
  3. get_events — chronological telemetry (state transitions, agent activity).
  4. If state == "PausedAwaitingClarification": get_pending_clarifications then answer_clarifications — the generation resumes on the next dispatcher tick.
  5. wait_for_generation — block on the terminal state.

9. Retry or cancel a failed generation

  1. get_generation — read failure_category to decide whether retry is appropriate.
  2. retry_generation to re-fire from the original kickoff envelope — see errors §409 for the four typed retry-rejection codes (RETRY_STATE_INVALID, RETRY_RESEARCHER_CHILD, RETRY_ENVELOPE_UNAVAILABLE, RETRY_OWNER_MISMATCH).
  3. OR cancel_generation if abandoning the run.
  4. wait_for_generation after a successful retry.

10. Soft-delete and restore

Asymmetric by historical convention — Package's delete/restore are folded into update_package's flags; Generation has dedicated tools.

  • Packages. update_package with delete: true — soft-deletes the row. Later: update_package with restore: true.
  • Generations. delete_generation — soft-deletes. Later: restore_generation.

Soft-deleted rows drop out of the default list queries; they're still recoverable until the 30-day retention window auto-purges them.

11. File a bug report or quality feedback

Pick the type that fits.

Broken behavior (404, wrong output, crash):

  1. submit_bug_report — include diagnostic context (URL, generation id, error excerpt). Returns the bug_report_id.
  2. list_my_bug_reports — your filed reports and their current state.
  3. get_bug_report — one record, including any review notes and state transitions.

Quality evaluation (was the interview good, is the package coherent, what's the build confidence):

  1. list_feedback_templates — discover the available rubrics. Seven templates ship in v1: end-to-end-specstep-quality (whole-run), interview-quality (Otto behavior only), package-buildability (deliverable only), api-doc-quality (the /api-docs/* surface), tooling-experience (MCP / CLI / IDE ergonomics), website-quality (the public marketing/docs site), and launch-readiness (cross-cutting pre-launch review). Pick the one whose scope matches the feedback — narrower rubrics keep the signal cleaner than the all-in-one.
  2. get_feedback_template — fetch the full sections for the chosen template to see which section ids to fill.
  3. validate_feedback — (Added 2026-05-19.) Dry-run the submission shape before committing. Returns { valid, errors[] } where each error carries code (the canonical FEEDBACK_* error code), message, and param_name. Same input as submit_feedback minus the recommendation_token. Use this when you're guessing at the rubric's section ids or the cap on a free-text field — better to catch the mistake without burning a submit_feedback call.
  4. submit_feedback — include type, title, full_report, the linked GUIDs (interview_id / generation_id / package_id), and rubric_section_responses + rubric_scores if you used a template.
  5. list_my_feedback — your filed feedback and its current state.
  6. get_feedback — one record, including any review notes and state transitions.

12. Capability and subscription discovery

  1. get_capabilities — schema versions, accepted enum values (review_profile, project_type, mirror_selection). Call BEFORE start_generation so you can avoid hardcoding magic strings that change on deploy.
  2. get_subscription — the caller's tier (Free / Pro / Team) + quota snapshot. Branch on tier-allowed profiles before kicking off generations.

Tool selection guide

A quick mapping from common agent intent to the best first tool. When in doubt, start here, then read that tool's reference entry below for argument detail.

Intent Start with
"I need to know what values are valid" get_capabilities
"I want to know if a generation will succeed" validate_generation_request
"I need to know if my tier allows this review profile" get_subscription
"I need the latest package for a generation" get_latest_package_for_generation
"I need one file from a package" list_package_filesget_package_file
"I need to search across all my packages" search_my_packages
"I need to inspect an addendum" list_change_requestsget_change_requestget_change_request_file
"I need to compare packages" compare_packages + diff_package_files
"I need review findings as data, not prose" get_security_findings + get_generation_quality_report
"My generation is paused — what's the question?" get_pending_clarificationsanswer_clarifications
"My generation failed — why?" get_generation (read failure_category) → get_events
"I want to retry a Failed generation" retry_generation
"I need account-wide cost over a period" get_usage
"I want to rate or evaluate a finished run" list_feedback_templatesget_feedback_templatesubmit_feedback
"I want to know if a feedback submission will be accepted" validate_feedback (dry-run) → submit_feedback
"I want to file a bug, not rate a run" submit_bug_report (broken behavior; use submit_feedback for quality evaluation)

Available tools

These are the SpecStep MCP tools available to standard authenticated callers. The nine categories below group tools by capability area.

Interview tools

start_interview

Creates a new interview. Takes no arguments — the opening agent turn arrives in the response's transcript. Describe what you're building in your first submit_interview_turn call (project type, vision, constraints); the interview's detected_type is inferred from that first turn.

submit_interview_turn

Submits a turn to an existing interview. Default mode is async (changed 2026-05-19): the call commits your user turn + enqueues a background job and returns a job_id you poll via get_interview_turn_status (or subscribe to the InterviewTurnJobStatusChanged SignalR push). Legacy inline-reply behavior is available via mode: "sync" but is subject to the ~60s Front Door ceiling and is scheduled for removal after one release cycle.

Arguments

Name Type Required Description
interview_id UUID yes The interview to append the turn to.
message string yes The user's turn. Empty / whitespace strings are rejected; cap is 16,384 characters.
client_request_id string no Optional idempotency token (1..128 chars of [A-Za-z0-9._:-]). A retry with the same value returns the cached result of the first call instead of re-invoking the LLM. Recommended for any caller that might retry on network failure.
mode string no Default "async" (changed 2026-05-19): returns a job_id you poll via get_interview_turn_status. Pass "sync" to opt into the legacy inline-reply path (subject to the ~60s Front Door ceiling — may 504 on long interviews; scheduled for removal).

Returns (async mode, default) — either:

  • {status: "queued", job_id, interview_id, submission_id?, user_turn_committed: true, snapshot: null} — your user turn committed; poll get_interview_turn_status(job_id) for the agent reply.
  • {status: "cached_replay", job_id: null, interview_id, submission_id, user_turn_committed: true, snapshot: <interview snapshot>} — you supplied a client_request_id whose original call already completed; here's the cached result.

Returns (sync mode, opt-in) — full interview snapshot: {id, status, detected_type, started_at, last_activity_at, completed_at, intake_artifact_id, transcript: [{role, content, occurred_at}, …], started_generation_id?, auto_start_failure?}. Read the last agent-role entry of transcript for the agent's reply. When the interview just transitioned to status: "complete", the response also carries the auto-handoff fields below.

Completion auto-handoff (added 2026-05-17). When the agent signals completion on a turn (the interview transitions to complete and an intake_artifact_id is produced), SpecStep auto-starts a generation with sensible defaults (review_profile: "Normal", mirror_selection: "ClaudeMd", has_ui derived from the detected project type) and surfaces the result on the same response:

  • started_generation_id — non-null on success; the generation id you can poll via wait_for_generation / get_generation.
  • auto_start_failure: {code, message} — non-null when auto-start failed (quota exceeded, validation error, transient provider failure, etc.). The interview turn still succeeded; call start_generation manually with the intake_artifact_id if you want to retry the kickoff with custom settings.

Both fields stay null when the turn didn't trigger completion. Auto-handoff is restricted to user-actor interviews; API-key actors receive auto_start_failure.code: "AUTO_START_NOT_SUPPORTED_FOR_ACTOR_TYPE" and call start_generation themselves.

The auto-handoff fields land on the snapshot returned via get_interview_turn_status when the async job's completion produced an intake artifact.

Errors — when an idempotency replay finds the original is still processing, you get INTERVIEW_TURN_IN_FLIGHT with data: {retryable: true, retry_after_seconds: 5, turn_committed: false, ...}. When the original failed, you get the cached error code with data: {retryable, turn_committed: false, original_error_code, replayed_from_cache: true, ...}. See errors.

get_interview_turn_status

Status poll for an async submit_interview_turn job. Returns the job's current state plus (when completed) the canonical interview snapshot, or (when failed) structured error fields.

Arguments

Name Type Required Description
job_id UUID yes The job_id returned by an async submit_interview_turn call.

Returns{status, job_id, interview_id, snapshot?, error_code?, error_message?, is_retryable?, created_at, completed_at?} where status is one of queued, running, completed, failed. When completed, snapshot carries the full interview state in the same shape sync submit_interview_turn returns. When failed, the error_code is one of the standard interview-turn codes (INTERVIEW_TURN_TIMEOUT, INTERVIEW_TURN_TRANSPORT_ERROR, INTERVIEW_TURN_STUCK_RUNNING, INTERVIEW_TURN_INTERNAL_ERROR, …) and is_retryable tells you whether re-submitting with the same client_request_id is safe.

Foreign job ids return a "not found" error (same info-hiding convention as get_interview).

cancel_interview_turn

Added 2026-05-18.

Cancels a background submit_interview_turn(mode: 'async') job by id. Useful when the user's submitted turn was wrong, when an LLM call is dragging on, or when the caller wants to abandon a half-finished turn rather than wait for it (or its stuck-job timeout). Queued jobs cancel cleanly; running jobs cancel best-effort — the job's terminal status will be cancelled, but the agent reply MAY still appear in the interview transcript if a mid-pipeline SaveChanges committed before the cancel landed. Idempotent on already-Cancelled jobs.

Arguments

Name Type Required Description
job_id UUID yes The job_id returned by an async submit_interview_turn call.

Returns{status, job_id, interview_id, created_at, completed_at?} where status is cancelled on the happy path. Mirrors the shape get_interview_turn_status returns (no snapshot field — the work was abandoned).

Returns a INTERVIEW_TURN_NOT_CANCELLABLE conflict when the job is already completed or failed (the work landed; the result is at get_interview_turn_status). Foreign job ids return a "not found" error (same info-hiding convention as get_interview_turn_status).

list_interviews

Lists the caller's interviews, newest first. Empty conversations (< 2 turns) are filtered out so abandoned-at-first-contact rows don't clutter the list.

Arguments

Name Type Required Description
status string no Comma-separated. One or more of active, paused, abandoned, complete, awaiting_clarification.
limit int no Default 20, max 100.

Returns{interviews: [...]} where each item has:

Field Type Description
id UUID Interview id.
status string One of the lowercase status values above.
detected_type string | null Project type inferred from the first user turn.
display_title string Short human-readable label for the interview.
turn_count int Total turns recorded so far.
started_at ISO-8601 When the interview was created.
last_activity_at ISO-8601 Timestamp of the most recent turn or state change.

get_interview

Returns the full state and transcript of an interview by id. Takes interview_id. Same auth boundary as REST: foreign callers get "not found" rather than a 403, so probing foreign ids is impossible.

The response carries a transcript_size introspection block (added in v0.18, 2026-05-22) — byte-identical to the REST shape — so MCP clients can observe how full a transcript is before queuing the next turn: { chars, tokens_estimate, max_chars, max_tokens, percent_used }. chars sums the UTF-16 length of every user + agent turn (system prompts and reference documents are excluded); tokens_estimate is chars / 4 (conservative upper bound). max_chars and max_tokens report the current platform ceiling but are not enforced in v0.18 — a later release will reject submit-turn calls that would exceed them with a structured error envelope.

delete_interview

Soft-deletes an interview by id. Takes interview_id. Allowed in any status (Active, Paused, Complete, Abandoned, AwaitingClarification, ClarificationResolved) — soft-delete is a "remove from my workspace" affordance, not a state-machine transition. Idempotent on already-deleted rows. The interview row stays in the database for audit + recovery; the conversation drops out of list_interviews and the workspace UI. Foreign callers get "not found" so foreign ids can't be probed. Returns {interview_id, action: "deleted"}. Sister tool: restore_interview.

restore_interview

Restores a soft-deleted interview by id. Takes interview_id. Idempotent on already-live rows. No state guard. Sister to delete_interview. Returns {interview_id, action: "restored"}.

list_intake_artifacts

Added 2026-05-08.

Lists the caller's intake artifacts (the structured output of a completed Interview, the sole input to start_generation). Sibling-shape to list_interviews; agents pick a ready-to-generate artifact without filtering interview status inline. Optional status filter ("ready" is the only meaningful value today; null/blank = same as "ready"; unknown labels return an empty list). Optional limit (default 50, max 200) and offset for pagination. Returns {artifacts: [{id, interview_id, project_name, schema_version, completed_at}, ...]}, newest first. Mirrors REST GET /v1/intake-artifacts. Use the returned id as the intake_id argument to start_generation.

get_intake_artifact

Added 2026-05-12.

Fetches a single intake artifact by id. Takes intake_artifact_id. Returns {id, interview_id, content_warning, payload_content_type, payload, project_attributes} — the full structured intake JSON the orchestrator feeds into start_generation, plus the project-attribute flags set by the post-interview attribute-detection pass (has_ui, has_persisted_data, has_ai_features, has_backend, requires_i18n, requires_compliance, compliance_frameworks).

The payload is user-authored JSON; it ships inside an untrusted_text envelope with a content_warning so MCP clients don't treat the strings as agent instructions. Owner-scoped — foreign and unknown ids surface as "not found" rather than 403, so probing is impossible. Use this when an agent wants to inspect or debug what an interview produced before calling start_generation, or to investigate a "why did this generation produce that" question after the fact.

External-connector tools

Added 2026-05-15.

MCP-driven flow for attaching a OneDrive / SharePoint / Google Drive folder to one of the caller's interviews. The MCP client itself is a CLI — it can't render a folder picker — so the kickoff tool returns a one-time launch URL the user opens in their default browser. The browser handles the existing provider-pick + OAuth + folder-pick + first-sync flow; the MCP client polls a sibling tool for terminal state. Same pattern as start_generation + get_generation. The synced files materialize as reference documents on the interview, identical to what the Web UI's "Connect a folder" affordance produces.

attach_external_folder

Creates an attach session and returns a launch URL. The MCP client opens the URL (or prints it for the user) and then polls get_attach_external_folder_session until the session reaches a terminal state.

Arguments

Name Type Required Description
interview_id UUID yes The interview the resulting connector's files will sync into. Caller must own the interview; foreign ids surface as "not found" rather than 403.

Returns

Field Type Description
attach_session_id UUID Session id. Pass to get_attach_external_folder_session to poll for status.
launch_url string Absolute URL the user opens in their browser (e.g., https://specstep.com/external-connectors/attach/<id>).
status string Initial status — always awaiting_provider_pick on a fresh kickoff.
expires_at ISO-8601 UTC timestamp the session expires (30 minutes after creation).
message string Human-readable prompt the MCP client surfaces to the user.

get_attach_external_folder_session

Polls the state of an attach session. Same auth boundary as the kickoff tool — cross-user reads surface as "not found".

Arguments

Name Type Required Description
attach_session_id UUID yes The session id returned by attach_external_folder.

Returns

Field Type Description
status string One of awaiting_provider_pick, awaiting_oauth, awaiting_folder_pick, syncing, completed, expired, cancelled, failed.
connector_id UUID | null Populated when status = completed. The new (or reused) ExternalConnector id.
provider string | null Populated once the user picks a provider in the browser. One of onedrive, sharepoint, googledrive, dropbox.
folder_name string | null Populated on or after commit. The folder the user selected.
files_synced int | null Populated when status = completed. Count of files materialized as reference documents on the interview.
error_code string | null Populated when status = failed (e.g., commit_failed, authorize_failed).
error_description string | null Human description of the failure when status = failed.
expires_at ISO-8601 | null UTC timestamp the session was set to expire.

Terminal states are completed, failed, expired, and cancelled. Unknown / expired session ids return a synthetic {"status": "expired"} response — the client can re-run attach_external_folder to start over.

Generation tools

start_generation

Starts a generation from a completed interview's intake. Takes the intake_id and (optionally) the review profile, project type, and version pins. Returns the generation id and initial state Queued. Subject to the same 5-kickoffs-per-minute rate limit as POST /v1/generations.

Many callers won't need to call this directly — when the agent signals completion on a submit_interview_turn call, SpecStep auto-starts a generation with sensible defaults and surfaces the new generation id as started_generation_id on the response snapshot. Call start_generation explicitly when you want non-default settings (custom review_profile, mirror_selection, etc.) or when the auto-start surfaced an auto_start_failure you need to retry past.

Arguments

Name Type Required Description
intake_id UUID yes The intake artifact produced by completing an interview.
review_profile string no One of Fast, Normal, Extensive, Researcher. Defaults to Normal.
project_type string no One of WebApp, MobileApp, MobileGame, DesktopApp, BrowserExtension, AiAgent, AiTool. Defaults to WebApp.
has_ui bool no Whether the project has a user interface. Defaults to false.
schema_version string no Pins the manifest schema version. Defaults to 1.0.0.
rubric_version string no Pins the review rubric version. Defaults to 1.0.0.
quality_rubric_version string no Pins the quality rubric version. Defaults to quality-1.0.
mirror_selection string no One of None, ClaudeMd, CursorRules, Copilot, All. Defaults to None.

Returns

Field Type Description
id UUID The new generation's id.
state string Initial state, normally Queued.
download_url string | null Populated only if the package is synchronously ready (rare).
package_id UUID | null Populated only if the package is synchronously ready.

get_generation

Breaking change in v0.9.5 (2026-05-06). This tool was previously called get_status. Callers using the old name must switch — the dispatcher rejects get_status with a MethodNotFound-style error.

Returns the current state of a generation. Takes a generation ID. Returns the state (one of Queued, Drafting, SpecialistReview, Reviewing, FreshEyes, RiskReview, SecurityReview, Assembling, Refining, Delivering, Paused, PausedAwaitingClarification, Complete, Failed, Cancelled, AddendumRunning), the current round, the running cost, the computed progress_percent, and the typed failure_category when the generation failed.

When the historical sample is large enough, the response also carries estimated_total_usd plus estimated_total_p25_usd / estimated_total_p75_usd / estimated_total_sample_size — the same forecast envelope the Generation Details page renders.

The response also carries project_name, description, kind ("specification"), and kind_label so the agent knows what the generation is about and can disambiguate the deliverable from runnable code.

Poll this until state is terminal — or use wait_for_generation instead, which returns a polling-cadence hint.

Arguments

Name Type Required Description
generation_id UUID yes The generation to inspect.

Returns (shared with wait_for_generation)

Field Type Description
id / generation_id UUID The generation's id.
state string One of Queued, Drafting, SpecialistReview, Reviewing, FreshEyes, RiskReview, SecurityReview, Assembling, Refining, Delivering, Paused, PausedAwaitingClarification, Complete, Failed, Cancelled, AddendumRunning.
current_round int Current review-loop round number.
progress_percent int Computed 0–100 progress signal.
running_cost_usd decimal Live cost so far. Settles to the package's total_cost_usd on Complete.
estimated_total_usd decimal | null Historical-median forecast; null when the sample is too small. From 2026-05-27, on a run that has auto-resumed after host restarts the forecast is widened by host_restart_resume_count (each resume re-runs work), so a resume-prone run's estimate reflects the extra cost instead of reading wildly low against the actual.
estimated_total_p25_usd / estimated_total_p75_usd decimal | null Percentile bounds; null when the forecast is null. Widened on resumed runs alongside estimated_total_usd.
estimated_total_sample_size int | null Number of historical generations behind the forecast.
project_name string | null Display name (override or auto-extracted).
description string | null Short intake-derived description, truncated to 280 chars.
kind / kind_label string "specification" + the canonical disambiguation copy.
failure_category string | null Typed failure category on Failed rows; null otherwise. See REST errors page.
failure_reason string | null Sanitized human-readable hint on Failed rows.
billing_state string | null Added 2026-05-18. One of NotStarted / Active / PausedRetrying / Complete / PausedAwaitingInput (the last added 2026-06-01 — a human-input pause, e.g. answering a clarification: your turn, no cost climbing, nothing stuck; distinct from PausedRetrying, a transient-error backoff). Customer-facing billing posture written atomically with every state transition. When billing_state is Active while running_cost_usd climbs, the caller knows their cost isn't being wasted — the platform is actively working. Null on pre-2026-05-18 generations (no projection row yet).
started_work_at ISO-8601 | null Added 2026-05-18. When the dispatcher first claimed the generation (distinct from started_at which is the queued-at time). Null on pre-2026-05-18 generations and while the row is still in pre-work states.
phase_detail string | null Added 2026-05-18. Human-readable phase label derived pure-function from state + current_round (examples: "Drafting", "Specialist review (round 2)", "Awaiting your clarification"). Present on every projection row. Null on pre-2026-05-18 generations.
progress_explanation string | null Added 2026-05-18. One-sentence explanation of what's happening at the current progress_percent (e.g., "Specialists are reviewing the draft in parallel"). Closes the same understanding gap as billing_state — the customer sees WHY the progress bar is where it is, not just the number. Null on pre-2026-05-18 generations.
estimated_duration_seconds number | null Added 2026-05-18. Historical-median forecast of the run's eventual total wall-clock duration (seconds), keyed by review_profile. Null when the historical sample is too small for a confident forecast (the floor is 5 completed generations in the rolling 30-day window) or on pre-2026-05-18 generations.
estimated_time_remaining_seconds number | null Added 2026-05-18. Best-effort "expected remaining" computed as estimated_duration_seconds - elapsed_since_started_work_at, floored at 0. Null while the generation is queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to estimating…).
estimated_completion_at ISO-8601 | null Added 2026-05-18. Best-effort wall-clock expected completion: started_work_at + estimated_duration_seconds. Null while queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to estimating…).
active_specialist string | null Added 2026-05-18. During SpecialistReview only — slug of the most-recently-completed specialist in the current round (codd / halo / tally / vera / trip / merlin / polo). A pragmatic single-value summary of a parallel fan-out. Null outside SpecialistReview, when no specialists have completed yet, or on pre-2026-05-18 generations.
retry_count int Added 2026-05-19. Number of recoverable LLM-provider retries fired during this run (rate-limit / transient 5xx / timeout backoffs). Starts at 0 and only increments mid-run — never decreases. Resets to 0 on a host-restart rewind because the counter belongs to a single dispatch attempt. Tells callers apart "healthy first attempt" (0) from "currently riding out a transient hiccup" (>0). Always present (defaults to 0 on pre-rollout generations).
last_retry_at ISO-8601 | null Added 2026-05-19. UTC timestamp of the most recent retry attempt. Null until the first retry fires.
next_retry_at ISO-8601 | null Added 2026-05-19. UTC timestamp the retry policy is currently waiting for before the next attempt (last_retry_at + backoff_delay). Null between retries. Lets callers display "next retry in X seconds" without guessing the backoff curve.
recoverable_error_category string | null Added 2026-05-19. Typed classifier for the recoverable failure that triggered the most recent retry. One of rate_limit / provider_timeout / provider_server_error / schema_violation / other. Distinct from terminal failure_category — that's set when the run fails for good; this is set when an LLM call temporarily failed but the retry policy is still covering it. Null when no retry has fired yet.
host_restart_resume_count int Added 2026-05-27. How many times this run was automatically resumed after a host restart (capped at 5). Distinct from retry_count: that one is provider-level and resets to 0 on a host-restart rewind, so a run that recovered from several restarts still reads retry_count: 0; this counter spans the run's whole life and only climbs. A non-zero value is the honest reason a run's running_cost_usd or estimated_total_usd runs higher than the clean-run forecast — each resume re-runs work: the full-rewind path re-runs Drafting from scratch, while cheaper in-place resumes pick up from a saved checkpoint. Always present (defaults to 0).
refinement_summary object | null Added 2026-05-29. Outcome of the pre-delivery refinement pass that fills referenced-but-missing docs before a package ships. null when the pass didn't run, made no change, and left no gap. When present, an object with: rounds_used (int — how many detect → refine → re-validate rounds ran); generated_count / dropped_count / residual_count (int); generated and dropped (arrays of {path, referenced_by[]} — docs filled with real content vs. dangling references removed); residual (array of {path, referenced_by[], reason} — references that ship as deferred stubs, i.e. the package's known gaps); and summary (a ready-to-render string). Mirrors the "Pre-delivery refinements" section in handoff.md.
reconciliation_summary object | null Added 2026-05-29. Outcome of the pre-delivery contradiction-reconciliation pass that resolves cross-document architecture contradictions (e.g. one doc says PostgreSQL, another DynamoDB) before a package ships. null when the pass found nothing to reconcile and left no residual. When present, an object with: rounds_used (int — how many detect → reconcile → re-validate rounds ran); reconciled_count / unresolved_count (int); reconciled (array of {category, summary, affected_locations[]} — contradictions resolved by redrafting the affected docs to agree); unresolved (array of {category, summary, affected_locations[], reason} — contradictions that ship as known gaps, with the reason); and summary (a ready-to-render string). A reconciled contradiction also disappears from consistency_findings. Mirrors the "Pre-delivery reconciliation" section in handoff.md.
blocker_resolution_summary object | null Added 2026-05-29. Outcome of the pre-delivery blocker resolve-or-clarify pass that acts on residual Critic-flagged blockers before a package ships. null when there were no residual blockers to act on. When present, an object with: resolved_count / clarified_count / residual_count (int); resolved (array of {target_section, summary} — blockers cleared by redrafting); clarified (array of {target_section, summary, question} — blockers escalated into a clarification question); residual (array of {target_section, summary, reason} — blockers that ship as known gaps); and summary (a ready-to-render string). Mirrors the "Pre-delivery blocker resolution" section in handoff.md.
refinement_audit object | null Added 2026-05-31. Consolidated audit of the whole pre-delivery refinement pipeline — one flat view of what it auto-fixed versus escalated, aggregated from the three fields above (refinement_summary / reconciliation_summary / blocker_resolution_summary) so you don't have to union three differently-shaped objects. null on a clean run where every refinement pass was a no-op. When present, an object with: auto_fixed_count / escalated_count (int); auto_fixed (the pipeline changed the package) and escalated (the pipeline surfaced an unresolved gap), each an array of {pass, action, target, detail}passstub-fill / reconciliation / blocker-resolution; actiongenerated / dropped / reconciled / resolved (auto-fixed) or residual-gap / unresolved-contradiction / clarified / residual-blocker (escalated); target is the doc path / section / contradiction category; detail is a human-readable summary / reason / clarification question (may be empty); and summary (a ready-to-render string). Mirrors the "Refinement audit" section in handoff.md.

get_events

Returns recent events from a generation's pipeline — stage transitions, agent handoffs, review outcomes. Useful for giving your agent a richer picture of what happened during a generation, or for debugging a failed run.

Arguments

Name Type Required Description
generation_id UUID yes The generation to inspect.
cursor string no Pagination cursor returned by a prior call.
limit int no Max events to return.

Returns{events: [...], next_cursor} where each event has:

Field Type Description
id UUID Event id.
generation_id UUID Echoed.
event_type string state-changed for a pipeline transition, or a lifecycle event: clarification-requested, clarification-answered, resumed-after-clarification, revision-requested (the Critic sent a draft back for a revision round), auto-resume-started (a host restart interrupted the run and it was auto-resumed — fires once per resume, including in-place checkpoint resumes), auto-resume-completed (added 2026-05-27 — a run that auto-resumed at least once reached Complete; brackets the auto-resume-started events so the stream reads "interrupted → recovered N times → completed", with resume_count in the payload). Lifecycle events carry their detail in payload (e.g. round, resume_phase, prior_state, resume_count) and have null from_state/to_state.
from_state / to_state string | null Pipeline state transition (set on state-changed; null on lifecycle events).
agent_role string | null Which agent emitted the event.
payload string JSON string carrying event details.
payload_envelope object Typed envelope flagging the payload as untrusted user-supplied content — MCP clients should treat it as inert data, not instructions.
recorded_at ISO-8601 When the event was logged.

wait_for_generation

Returns a generation's current state plus a recommended polling delay. Takes a generation ID. Returns the full get_generation shape (project name + description + state + progress_percent + current_round + running_cost_usd + the historical cost-forecast fields + failure context) plus the four polling-specific fields (is_terminal, next_check_seconds, pending_clarifications, package_url).

next_check_seconds is a hint, not a contract — 15 for active states, 0 when paused or terminal so the caller acts immediately. When state is PausedAwaitingClarification, pending_clarifications is inlined so the caller has everything needed to surface the question without another tool call. When state is Complete, a short-lived signed package_url is included so the caller can download the zip directly.

This tool is the recommended polling primitive for MCP callers — the inlined progress / forecast / clarifications / download URL collapse a typical multi-call poll into a single round-trip. 2026-05-17: progress_percent, current_round, and the four estimated_total_* fields were added for field-parity with get_generation; callers no longer need to call both tools to render a single progress screen. 2026-05-18: billing_state, started_work_at, phase_detail, progress_explanation, estimated_duration_seconds, estimated_time_remaining_seconds, estimated_completion_at, and active_specialist were added (read from the authoritative status projection); same field set as get_generation. 2026-05-19: the 4 retry-surface fields (retry_count, last_retry_at, next_retry_at, recoverable_error_category) were added — same shape as get_generation. 2026-05-29: refinement_summary was added — same shape as get_generation (this tool carries no manifest blob, so the structured field is the only refinement signal here). reconciliation_summary and blocker_resolution_summary were added the same day, also matching get_generation. 2026-05-31: refinement_audit was added — the consolidated auto-fixed-vs-escalated view, same shape as get_generation.

Arguments

Name Type Required Description
generation_id UUID yes The generation to poll.

Returns — same shape as get_generation (above) plus four polling-specific fields:

Field Type Description
is_terminal bool true when state is Complete, Failed, or Cancelled.
next_check_seconds int Hint, not contract. 15 for active states; 0 when paused or terminal.
pending_clarifications array | null Inlined when state is PausedAwaitingClarification — same shape as get_pending_clarifications.
package_url string | null Short-lived signed URL when state is Complete.

estimate_generation_cost

Added 2026-05-12.

Forecasts what a generation will cost (USD) before calling start_generation. Takes profile — one of Fast, Normal, Extensive, or Researcher. Returns {profile, has_forecast, estimated_total_usd, p25_usd, p75_usd, sample_size, note} — the rolling 30-day median across completed generations for the requested profile, with p25 / p75 confidence bounds and the sample size behind the estimate.

The forecaster is profile-keyed only — it doesn't yet take an intake_id, so the estimate reflects "what this profile usually costs" rather than a per-intake projection. Per-intake variance can be substantial; the p25 / p75 bounds capture that envelope. When the historical sample is below the forecaster's floor, has_forecast is false and the response carries a "not enough data" note rather than a low-confidence number. Useful for sanity-checking cost before kicking off Normal or Extensive runs.

validate_generation_request

Added 2026-05-16.

Dry-run companion to start_generation. Takes the same arguments (intake_id required; project_type, has_ui, review_profile, schema_version, rubric_version, quality_rubric_version, mirror_selection optional with the same defaults). Runs the side-effect-free pre-flight checks the live tool does (intake-existence + ownership, account-approval gate, monthly quota + Extra Usage fallback, review-profile-vs-tier, External Connectors tier gate) WITHOUT enqueueing a generation. Returns {is_valid, blocking_errors: [{code, message}], warnings: [{code, message}]}.

Each error's code matches the exception code start_generation would throw on the live path, so callers can branch on stable identifiers:

  • INTAKE_NOT_FOUND — intake doesn't exist or caller lacks access
  • USER_PENDING_APPROVAL — account hasn't been approved yet
  • QUOTA_EXCEEDED — monthly quota reached + no Extra Usage rescue
  • EXTRA_USAGE_INSUFFICIENT — monthly quota reached + Extra Usage balance below the p75 forecast charge
  • PROFILE_NOT_ALLOWED — requested review_profile isn't available on the caller's tier
  • FEATURE_NOT_ALLOWED — intake uses External Connector data but the caller's tier doesn't allow it

Warnings are informational and don't fail validation:

  • EXTRA_USAGE_WILL_BE_RESERVED — monthly quota exhausted but Extra Usage covers the next call
  • CONCURRENCY_AT_CAP / CONCURRENCY_HIGH — concurrency slots heavily in use; a live call right now could race to CONCURRENCY_CAP_REACHED

The concurrency-race caveat: a dry-run that returns is_valid: true can still 409 on a real call if another kickoff lands first. Concurrency state is informational only.

get_security_findings

Added 2026-05-16.

Returns the structured security-review findings for a generation. Takes generation_id. Returns {generation_id, has_review, finding_count, max_severity, findings: [{severity, surface, topic, title}, ...]}.

Severity values: Critical, Major, Minor, Info, None. Surface values: Spec, ReferenceCode, GeneratedPackage, PromptInjection. has_review is false when the generation has no manifest yet (still in flight) or the review profile didn't include the Security Expert. Use this to gate automation on a generation's security posture without parsing the markdown report — e.g., max_severity == "Critical" → block. The full report markdown stays in the package zip; this tool exposes only the compact structured projection that already lives in the manifest.

get_generation_quality_report

Added 2026-05-16.

Aggregates the four non-security review sections from the generation's manifest into a single structured payload: reliability (Atlas), accessibility (Halo), cost (Tally), risk (Hazard). Takes generation_id. Returns {generation_id, reliability, accessibility, cost, risk} where each sub-section is {has_review, finding_count, max_severity, findings: [{severity, topic, title}, ...]}.

Severity values match get_security_findings. has_review: false on a sub-section means the reviewer wasn't part of the generation's review profile (e.g., the Fast profile skips Cost + Risk). Callers can distinguish "no findings + reviewer ran" from "reviewer didn't run" — useful for PR-gate automation that wants to know whether a quality signal is missing vs known-clean. Pair with get_security_findings for the security gate.

Clarification tools

get_pending_clarifications

Returns the structured clarifications a paused generation is waiting on. Takes a generation ID. Returns {state, clarifications} where each clarification has agent, section (may be null), question, why, and proposed_default. Empty array when the generation isn't paused.

The chat-driven web flow asks these questions through the user's interview chat; this tool exposes the same structured surface to MCP callers so an agent can prompt its user without parsing free-text agent turns.

answer_clarifications

Submits answers to a paused generation's clarifications and resumes the run. Takes the generation ID and an array of {question, answer} pairs. Match each question exactly to the verbatim text from get_pending_clarifications — pairing is by question text. Answers must cover every pending clarification (all-or-nothing for v1).

Returns {generation_id, accepted, message}. The orchestrator picks the run back up on the next dispatcher tick and threads the answers into the next agent call.

Generation control

list_generations

Lists the caller's generations regardless of whether they have a Package row yet, so callers can see in-progress / failed / paused / cancelled runs alongside completed ones.

Arguments

Name Type Required Description
status string no Comma-separated. Roll-up tokens (in_progress, complete, failed, cancelled, paused) or exact state names (Drafting, Reviewing, etc.). Case-insensitive.
limit int no Default 50, max 200.
offset int no Default 0.
order string no desc (newest-first, default) or asc.

Returns{rows: [...]} where each row has: id, short_id, project_name, state, review_profile, cost_usd, started_at, completed_at, failed_at, current_round, failure_reason, failure_category, source_channel, interview_id, progress_percent (the row's live 0–100 progress; null on pre-rollout generations that have no projection yet).

delete_generation

Soft-deletes a generation by id. Takes generation_id. Only allowed on terminal-state rows (Complete, Failed, Cancelled); attempting to delete an in-flight generation throws an error — cancel the run first. Idempotent on already-deleted rows. The generation drops out of list_generations and the workspace; the row stays in the database for audit. Returns {generation_id, action: "deleted"}. Sister tool: restore_generation.

restore_generation

Restores a soft-deleted generation by id. Takes generation_id. No state guard — even if the generation was Failed or Cancelled at delete time, restore returns it to your workspace in the same state. Idempotent on already-live rows. Sister to delete_generation. Returns {generation_id, action: "restored"}.

cancel_generation

Added 2026-05-08.

Cancels an in-flight generation. Takes generation_id and an optional reason string. Marks the row Cancelled (a distinct terminal state from Failed) and signals the orchestrator's CancellationToken so any in-flight LLM call halts instead of running to completion (avoiding cost on a run you no longer want). Already-terminal rows return an error — cancel_generation is a no-op on Complete, Failed, or already-Cancelled runs. Use this when an agent observes a stuck or runaway generation and wants to bail out cleanly. Returns {generation_id, state: "Cancelled"}.

retry_generation

Added 2026-05-08.

Retries a Failed generation. Takes generation_id. Replays the original kickoff command verbatim — same intake artifact, same review profile, same multimodal context (images + reference docs are re-hydrated from blob storage) — as a brand-new generation row. The original Failed row stays in the database for audit. Only the original owner can retry; cross-user retry is rejected. Returns {original_generation_id, new_generation_id, state, package_id}. Common error codes: RETRY_STATE_INVALID (only Failed rows are retryable), RETRY_RESEARCHER_CHILD (re-fire the parent Researcher run from the original interview), RETRY_ENVELOPE_UNAVAILABLE (legacy row predating the persisted-command feature), RETRY_OWNER_MISMATCH. Quota and approval errors (QUOTA_EXCEEDED, USER_PENDING_APPROVAL) propagate from the underlying handler.

pause_generation

Added 2026-05-08.

Pauses a running generation. Takes generation_id. User-initiated pause, distinct from the orchestrator's automatic PausedAwaitingClarification state (which fires when an agent needs more input — use get_pending_clarifications + answer_clarifications for that flow). The aggregate records the pre-pause state in the event log so a subsequent resume_generation can restore it. Already-terminal rows return a 409-equivalent error. Returns {generation_id, state: "Paused"}. Sister to resume_generation.

resume_generation

Added 2026-05-08.

Resumes a Paused generation back to its pre-pause state. Takes generation_id. Reads the most recent non-Paused to_state from the event log and restores it; the orchestrator picks up where it left off. State must currently be Paused (use get_generation to check); any other state returns a 409-equivalent error. If the event log has no pre-pause state recorded (corrupt history), surfaces the same error. Returns {generation_id, state} where state is the restored pre-pause state. Sister to pause_generation.

update_generation_name

Added 2026-05-08.

Set or clear the user-facing display name on a generation. Takes generation_id and an optional name. Useful for correcting placeholder / null project names on completed generations (e.g., when the auto-extractor returned (unnamed) because the intake JSON was missing a project_name). Pass an empty/whitespace name (or omit it) to clear the override and let the auto-extractor's best guess take over. Returns {generation_id, display_name}. Mirrors REST PATCH /v1/generations/{id}/name.

Capabilities & metadata

get_capabilities

Added 2026-05-08.

Discover schema versions and the enumerable inputs the API accepts so callers can avoid hardcoding magic strings. Takes no arguments. Returns {schema_version, rubric_version, quality_rubric_version, review_profiles, project_types, mirror_selections}. Anonymous-shaped (the values describe the public contract and don't depend on the caller). Use this BEFORE start_generation / start_interview to discover valid review_profile and project_type values; values change only on deploy. Mirrors REST GET /v1/capabilities.

Account & usage

get_subscription

Returns the calling user's subscription tier and current calendar-month quota snapshot. Takes no arguments. Returns {tier, status, current_period_end, quota: {monthly_limit, monthly_used, concurrency_limit, period_reset_at}}tier is one of Free, Pro, Team; status reflects Stripe's subscription state. Useful before kicking off a generation so the agent can warn the user if they're at or near their monthly quota. Mirrors the subscription field on REST GET /v1/me plus the standalone REST GET /v1/billing/subscription endpoint.

get_usage

Aggregates the caller's LLM cost and token usage over a time window. Mirrors REST GET /v1/usage.

Arguments

Name Type Required Description
from ISO-8601 timestamp no Window start. Default: 30 days ago.
to ISO-8601 timestamp no Window end. Default: now. Max window 366 days.
group_by string no One of provider, model, role, day, week, month, key, user. Defaults to model.

Returns

Field Type Description
from / to ISO-8601 Echoed window bounds.
group_by string Echoed grouping key.
rows array Each entry has {group, input_tokens, output_tokens, cached_tokens, cost_usd, invocation_count}.

Package tools

get_latest_package_for_generation

Added 2026-05-08.

Get the current package metadata + a time-limited download URL for a generation by generation_id (rather than by package_id). Use this when an agent has just completed a generation and wants the package without re-querying list_packages. 404-equivalent error when the generation has no package yet (still in flight) or when the package was soft-deleted. Mirrors REST GET /v1/generations/{id}/package. Future-proofs for the package-update flow (multiple package versions per generation): when that lands, this tool returns the CURRENT (latest) package without callers having to filter list_packages.

Arguments

Name Type Required Description
generation_id UUID yes The generation whose latest package to return.

Returns

Field Type Description
id UUID Package id.
generation_id UUID Echoed.
version string Package version (currently always 1.0.0).
download_url string SAS-tokened blob URL for the package zip.
download_url_expires_at ISO-8601 When the SAS URL expires. Refetch this tool to get a fresh URL.
total_cost_usd decimal What the generation's LLM calls cost.
retention_until ISO-8601 | null When the package will be auto-deleted; null means indefinite.
deleted_at ISO-8601 | null Set if the package was soft-deleted.
project_name string | null Display name (override or auto-extracted).
description string | null Short description, truncated to 280 chars.
kind / kind_label string "specification" + canonical disambiguation copy.

list_package_files

Added 2026-05-08.

Lists every file inside a completed package zip, with the uncompressed size of each entry. Takes package_id. Returns {package_id, files: [{path, size_bytes}, ...]} sorted lexicographically by path. Streams the zip's central directory from blob storage via Azure SDK range requests — the full archive is never materialized on the server. Pair with get_package_file to read individual files without the zip download dance. Useful when a coding agent wants to inspect package structure (architecture docs, requirements, ADRs, etc.) and pick which files to read.

get_package_file

Added 2026-05-08.

Returns the bytes of a single file from a package zip. Takes package_id and path (use list_package_files to discover available paths). The response shape depends on the file type:

  • Text entries (markdown, YAML, JSON, plain text, CSV, SVG): {package_id, path, content_type, content} where content is the raw UTF-8 string.
  • Binary entries (PNG, unknown extensions): {package_id, path, content_type, content_base64} where content_base64 is the base64-encoded payload (the JSON envelope can't carry malformed UTF-8).

Files larger than 256 KB return an error directing the caller at the bulk zip download URL (use get_package). Path-traversal segments (..) are rejected at the application layer. Streams the requested zip entry from blob storage; the full archive is never materialized.

search_package

Added 2026-05-08.

Full-text search across a package's indexed file contents (markdown, YAML, JSON, plain text, CSV, SVG entries — binary files are skipped during indexing). Takes package_id, query, and an optional limit (default 20, max 50). Returns {package_id, query, results: [{file_path, snippet, rank}, ...]} ranked by relevance, newest match first within rank ties. Snippets are HTML-highlighted with <mark>...</mark> markers around match terms; agents can render them directly or strip the tags as preferred.

Query syntax follows Postgres websearch_to_tsquery: quoted phrases ("agent topology"), OR for alternation (auth OR session), -term for exclusion (auth -test). Case-insensitive; English stemming is applied (so searching matches search). An empty query returns an empty result set rather than every row.

Results are scoped to a single package. For cross-package search across every package the caller owns in one round trip, use search_my_packages (below). The index is built at package completion; SpecStep staff can re-trigger indexing on request if it falls out of sync. Mirrors REST GET /v1/packages/{id}/search?q=...&limit=....

Arguments

Name Type Required Description
package_id UUID yes The package to search.
query string yes websearch_to_tsquery syntax (quoted phrases, OR, -term).
limit int no Default 20, max 50.

Returns{package_id, query, results: [{file_path, snippet, rank}, …]}. snippet contains <mark>...</mark> highlights around match terms.

search_my_packages

Added 2026-05-08.

Cross-package full-text search across every non-deleted package the caller owns. Takes query and an optional limit (default 10, max 25). Returns {query, results: [{package_id, project_name, version, total_hit_count, files: [{file_path, snippet, rank}, ...]}, ...]} — matched packages ordered by their best per-file rank, with up to 5 file hits embedded in each entry. total_hit_count carries the per-package true count so callers can render "showing N of M" or follow up with search_package for a deep look at any single package.

Same query syntax as search_package (Postgres websearch_to_tsquery — quoted phrases, OR, -term). Empty query returns an empty result set.

Replaces the prior N+1 fan-out pattern (call list_packages, then search_package per package). Use this tool whenever you don't already know which package to search. Mirrors REST GET /v1/packages/search?q=...&limit=....

Arguments

Name Type Required Description
query string yes Same websearch_to_tsquery syntax as search_package.
limit int no Default 10, max 25.

Returns{query, results: [{package_id, project_name, version, total_hit_count, files: [{file_path, snippet, rank}, …]}, …]}. Packages are ordered by their best per-file rank; up to 5 file hits embedded per package; total_hit_count is the per-package true count.

get_package

Returns the documentation package metadata. Takes a package_id. Read the package_id from start_generation (returned alongside the new generation) or from get_generation once the generation reaches Complete. Includes project_name, description, kind, and kind_label so the deliverable is identifiable + clearly labeled as a specification package, not application code. generation_id is null for packages created by migrating existing documentation rather than by a generation run (Migrate Existing Docs, 2026-05-27); present for generated packages.

preview_doc_migration

Classifies an uploaded documentation archive onto the canonical SpecStep package layout and returns the proposed mapping — no persistence. Takes archive_base64 (a base64-encoded .zip; inline cap ~4 MB — use the REST endpoint POST /v1/doc-migrations/preview for larger) and optional source_archive_name. Returns {source_archive_name, source_byte_count, total_file_count, classified_count, unclassified_count, classifier_version, mapping: [{source_path, doc_type, target_path, layer, confidence}, ...], conflicting_target_paths: [...]}. Run this first; a non-empty conflicting_target_paths means two files claim the same canonical slot — resolve with target_path_overrides on commit.

commit_doc_migration

Normalizes an uploaded documentation archive into a migrated package and persists it (canonical layout + _source/ for unplaceable files + a source: migrated manifest), linking it to a project. Takes archive_base64 (base64 .zip, ~4 MB inline cap), optional source_archive_name, optional project_id (defaults to your default project), optional version (default 1.0.0), and optional target_path_overrides (a map of source-path → target-path corrections from the reviewed preview). Returns {migration_id, package_id, project_id, version, classified_count, unclassified_count}. The resulting package appears in list_packages / get_package with a null generation_id. Errors when two sources still claim one canonical slot — supply target_path_overrides to resolve.

list_packages

Lists documentation packages on your account, with project_name + description + kind annotations on every row so the caller can identify each package without a per-row follow-up. Each row also carries generation_state so callers can tell which packages came from runs that finished cleanly versus runs that failed mid-flight.

Arguments

Name Type Required Description
limit int no Default 50, max 200.
offset int no Default 0.
order string no desc (newest-first, default) or asc.

Returns{packages: [...], next_cursor} where each entry has:

Field Type Description
id UUID Package id.
generation_id UUID | null Source generation. null for packages created by migrating existing documentation — those have no originating run.
version string Package version.
total_cost_usd decimal What the generation cost.
retention_until ISO-8601 | null When the package will be auto-deleted.
deleted_at ISO-8601 | null Set if soft-deleted (filtered out by default).
project_name string | null Display name.
description string | null Short description, truncated to 280 chars.
kind / kind_label string "specification" + canonical disambiguation copy.
generation_state string Final state of the source generation (Complete, Failed, etc.).

request_change

Added 2026-05-09.

Files a change-management addendum against a completed package. Single-LLM-call flow (~30 seconds, ~$0.40-0.50) that produces a 5-file markdown bundle (background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md) attached as a sibling artifact to the existing package — no version bump.

Use this tool when an agent has a focused single-change request against a completed package — "Add Apple ID OAuth", "Localize French", "Switch session storage from cookies to JWT". For structural rewrites that warrant a fresh package version (~$2.50, multi-agent pipeline), call start_generation off the original interview's intake instead.

The addendum row also writes a bell-dropdown notification under the new AddendumComplete kind so the user sees the change land on their next page load. Mirrors REST POST /v1/packages/{id}/addenda.

Arguments

Name Type Required Description
package_id UUID yes The completed package to file the addendum against.
title string yes ≤ 200 chars. Short label for the change.
description string yes ≤ 4000 chars. Free-text description of the change requested.

Returns

Field Type Description
addendum_id UUID The new addendum's id.
package_id UUID The parent package id (echoed).
download_url string SAS-tokened blob URL for the 5-file markdown zip; valid for one hour.
cost_usd decimal What the LLM call cost (typically ~$0.40–0.50).

list_audiences

Added 2026-05-18.

Public catalog of audiences understood by explain_package. No arguments. Returns {audiences: [{slug, display_name, description}, ...]} — the V1 set is executive, product-manager, engineering-manager, new-engineer, investor, security. Mirrors REST GET /v1/explain/audiences. Use this to populate a picker before calling explain_package, or to validate a slug before submitting.

explain_package

Added 2026-05-18.

Rewrites a completed package as a short audience-tailored markdown explanation. One LLM round-trip (~10 seconds, ~$0.05) for a cold call; subsequent calls for the same (package, audience) pair return the cached row instantly and at zero cost.

Use this when an agent needs to summarize a package for a specific reader — e.g., "give me the executive cut" or "explain this to a new engineer" — instead of streaming the full bundle.

Arguments

Name Type Required Description
package_id UUID yes The package to explain.
audience string yes One of the slugs returned by list_audiences.

Returns

Field Type Description
markdown string Audience-tailored explanation, ≤ 8192 chars.
audience string Echoed slug.
model string LLM model id used for generation.
cost_usd decimal Cost of the LLM call (0 on a cache hit).
cached bool true when the result was served from a previously-generated row.

Errors: EXPLAIN_AUDIENCE_UNKNOWN if the slug isn't in the catalog; QUOTA_EXPLAIN_EXCEEDED if the monthly explanation quota is reached for the caller's tier; "not found" if the package isn't owned by the caller. Mirrors REST POST /v1/packages/{id}/explain.

list_packages_for_generation

Added 2026-05-12.

Lists every package produced by a generation. Takes generation_id. Returns {generation_id, packages: [{id, generation_id, version, total_cost_usd, retention_until, deleted_at, addendum_count, addendum_total_cost_usd}, ...]}. Today there is at most one package per generation, but the array shape is forward-compatible with the multi-version-package flow.

Each row carries addendum_count + addendum_total_cost_usd so an agent gets the full package and change-request picture in one call — no chaining get_latest_package_for_generationlist_change_requests → manual cost sum. Owner-scoped — foreign and unknown generation ids surface as "not found." When the generation has no package yet (still in flight or never reached Complete), returns an empty packages array rather than 404 — distinguishes "in flight" from "permission denied."

list_change_requests

Added 2026-05-12.

Lists every change-request addendum filed against a package, newest-first. Takes package_id. Returns {package_id, content_warning, addenda: [{id, title, description, cost_usd, created_at, download_url}, ...]}. Each download_url is a freshly issued SAS-tokened blob URL valid for one hour, pointing at the addendum's 5-file markdown zip.

title and description carry the user's free text from the original request_change call; they ship under a content_warning envelope so MCP clients don't treat them as agent instructions. Owner-scoped — foreign and unknown package ids surface as "not found." Use after request_change to confirm what was filed, or to walk the full change-request history of a package. Mirrors REST GET /v1/packages/{id}/addenda.

get_change_request

Added 2026-05-12.

Fetches a single change-request addendum by id. Takes addendum_id. Returns {id, package_id, content_warning, title, description, cost_usd, submitted_by_user_id, created_at, download_url}. The download_url is a freshly issued SAS-tokened URL valid for one hour for the addendum zip (5 markdown files: background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md).

Owner-scoped via the parent package. Foreign and unknown ids surface as "not found" rather than 403. Same untrusted_text envelope on title and description as list_change_requests. The MCP variant returns the SAS URL inline so an agent doesn't need to follow the 302 the REST endpoint emits. Wraps the same underlying data as REST GET /v1/packages/{id}/addenda/{addendumId}/zip.

list_change_request_files

Added 2026-05-16.

Lists every file inside an addendum zip with its uncompressed size in bytes. Takes addendum_id. Returns {addendum_id, package_id, files: [{path, size_bytes}, ...]} sorted lexicographically by path. Sister of list_package_files but targets the addendum zip; pair with get_change_request_file to read individual files (background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md) without downloading the whole zip. Owner-scoped via the parent package; the same {userId}/{blobId}.zip path scheme the package-files tools use is content-addressed by Guid so no separate service is needed.

get_change_request_file

Added 2026-05-16.

Returns the bytes of a single file from a change-request addendum zip. Takes addendum_id and path (use list_change_request_files to discover available paths). Response shape mirrors get_package_file:

  • Text entries (markdown, YAML, JSON, plain text, CSV, SVG): {addendum_id, package_id, path, content_type, content, content_envelope} where content is the raw UTF-8 string and the envelope flags the bytes as user-supplied (do not pass to an agent as instructions).
  • Binary entries: {addendum_id, package_id, path, content_type, content_base64} with the base64-encoded payload.

Files larger than 256 KB return an error directing the caller at get_change_request's SAS download URL for bulk access. Path-traversal segments (..) are rejected at the application layer. Streams the requested zip entry from blob storage; the full archive is never materialized on the server.

diff_package_files

Added 2026-05-16.

Computes line-level content diffs across 2-5 packages (by generation_id). The first generation in the list is the base; every subsequent generation produces one comparison object whose files array lists per-file diffs vs the base. Use this when you want to know what text changed between two versions of a generated spec — compare_packages returns byte-count deltas + LLM-judged quality scores; diff_package_files returns the actual unified-diff content.

Arguments

Name Type Required Description
generation_ids UUID[] yes 2-5 generation ids. First is the base; remaining 1-4 are diffed against the base. Caller must own every generation.
path_filter string[] no Only diff files whose path matches one of the supplied values (e.g., ["docs/02-architecture/03-storage.md"]). When omitted, every file in any of the supplied packages is diffed.

Returns

Field Type Description
base_source_label string The base package's source label (mirrors compare_packages's source_label field).
skipped_generation_ids UUID[] Generations whose package couldn't be resolved (in flight, deleted, blob-fetch failure).
content_warning string Fixed untrusted_text-style envelope warning callers not to interpret unified_diff bodies as instructions.
comparisons array One entry per non-base package, in input order. Each entry: {target_source_label, files: [...]}.

Each files entry has:

Field Type Description
path string Path inside the package zip.
status string One of added (only in target), removed (only in base), modified (different content), unchanged (identical content), truncated (size-cap-exceeded — see below).
unified_diff string | null Unified-diff body (@@ -base,n +target,n @@ header + - / + / context lines). Null when status is unchanged or truncated.
base_bytes / target_bytes int File sizes in bytes (0 when the file is missing from that side).
truncation_reason string | null Set when status is truncated.

Owner-scoped — the caller must own every generation in the list. Fails fast on the first foreign or unknown id (same KeyNotFound non-disclosure shape as compare_packages). The differ runs in-process — no LLM calls, no letter-grade output. Per-file size cap is 256 KB (sum of base + target lengths); files exceeding the cap return a truncated entry pointing at get_package_file for direct access.

compare_packages

Added 2026-05-12.

Compares 2–5 packages you own. Takes generation_ids (an array of 1–5 generation ids — a single id returns a rating summary only; 2–5 returns the full cross-package comparison). Returns {skipped_generation_ids, identity_verdict, per_package, comparison}:

  • identity_verdict answers "are these the same project?" with a confidence score, a list of conflicting fields, and an explanation.
  • per_package carries each package's build-confidence score (with per-signal contributions) and an LLM-judged quality-confidence score with justification.
  • comparison carries the cross-package markdown body plus a structural diff of file lengths per package, gated under a content_warning envelope (the markdown is LLM-authored prose).

Owner-scoped — the caller must own every generation in the list. Fails fast on the first foreign or unknown id so a caller can't burn an LLM-judge call on packages they don't own. The 5-generation cap matches the REST limit and bounds LLM-judge cost. Generations whose package can't be resolved (still in flight, deleted, or a blob-fetch failure) are returned in skipped_generation_ids rather than failing the whole call. Useful when an agent wants to evaluate "which of my packages is best" or "how does my latest run compare to the previous one."

Async is the default (changed 2026-05-19). A real 2–5 package compare runs an LLM-judge pass that typically takes 30–80s — longer than most MCP clients' request timeout. So compare_packages defaults to mode: "async": it enqueues a background job and returns {status: "queued", job_id} within milliseconds. Poll get_compare_packages_status with that job_id for the canonical result. Pass mode: "sync" only when you know the compare fits inside your client's timeout (a single-package rating summary, or two small packages).

Arguments

Name Type Required Description
generation_ids UUID[] yes 1–5 generation ids. One id returns a rating summary only; 2–5 returns the full cross-package comparison.
mode string no async (default) — enqueue a job + return job_id to poll; sync — run inline and return the full result (small compares only, else the MCP client times out).

Returns

In async mode: {status, job_id} — poll get_compare_packages_status(job_id). In sync mode (and as the result payload of a completed async job):

Field Type Description
skipped_generation_ids UUID[] Generations whose package couldn't be resolved (in flight, deleted, blob-fetch failure).
identity_verdict object {same_project, confidence, conflicting_fields, explanation} — answers "are these the same project?".
per_package array One entry per resolved package with {generation_id, build_confidence: {score, signals: [...]}, quality_confidence: {score, justification}}.
comparison object | null When ≥ 2 packages resolve: {content_warning, markdown_body, file_length_diff}. The markdown is LLM-authored prose under an untrusted_text envelope.

get_compare_packages_status

Added 2026-05-19 — the poller for compare_packages(mode: "async").

Fetches the status of a background compare job. Takes the job_id returned by an async compare_packages call. Owner-scoped — only the user who enqueued the job can poll it.

Arguments

Name Type Required Description
job_id UUID yes The job_id from compare_packages(mode: "async").

Returns

Field Type Description
status string queued, running, completed, or failed.
result object | null Present when status is completed — the same shape as a sync compare_packages result (above).
error_code / error_message / is_retryable string / string / bool Present when status is failed. is_retryable tells you whether to re-enqueue.

Poll on a gentle cadence (2–5s) until status is completed or failed. A 2–5 package compare usually resolves in 30–80s.

estimate_change_request_cost

Added 2026-05-12.

Forecasts what a single request_change addendum will cost (USD). Takes no arguments. Returns {has_forecast, estimated_total_usd, p25_usd, p75_usd, sample_size, note} — the rolling 30-day median across completed addenda with p25 / p75 confidence bounds, or a "not enough data" envelope when the sample is below the forecaster's floor.

No profile dimension — every addendum uses the same prompt and model today, so the forecast is a single global median. The p25 / p75 bounds capture per-addendum variance (driven mostly by description length and change complexity). Symmetric with estimate_generation_cost; useful before calling request_change when cost matters.

update_package

The all-in-one mutation tool for packages. Folds three operations into one call (the MCP transport doesn't have a natural HTTP-verb equivalent of DELETE or PATCH, so the operation is encoded as a flag).

Takes package_id plus exactly one of:

  • retention_until: <date-time | null> — set or clear the package's retention deadline. Pass an ISO-8601 timestamp to extend retention; pass null to make retention indefinite.
  • delete: true — soft-delete the package. Idempotent. The package row drops out of list_packages but stays in the database for audit + recovery.
  • restore: true — restore a soft-deleted package. Idempotent on already-live rows. Sister operation to delete: true.

Returns {package_id, action: "deleted" | "restored" | "retention_updated"}. Passing both delete: true and restore: true returns an error.

Why bundled instead of separate delete_package / restore_package tools? Package was the first entity to expose multi-operation mutations through MCP, and bundling them into one tool kept the manifest small. Newer entities (Interview, Generation) use dedicated delete_* / restore_* tools; both styles work.

Support tools

submit_bug_report

Submits a bug report tied to the calling user. Takes title, description, optional severity, optional related_generation_id, optional current_route, optional caller-supplied user_agent. Persisted to the caller's report history. Server-side enrichment stamps account name/email/plan, build version, and the AI-tool heuristic.

list_my_bug_reports

Lists the caller's bug reports, newest first.

Arguments

Name Type Required Description
limit int no Default 20, max 200.

Returns{bug_reports: [...]} where each entry has:

Field Type Description
id UUID Bug report id.
title string Short title supplied at submit time.
severity string Caller-supplied severity (or default if omitted).
status string Triage state (e.g. Open, Triaged, Closed).
created_at ISO-8601 When the report was submitted.
updated_at ISO-8601 Most recent state change.
related_generation_id UUID | null Generation the report was filed against (if any).
triage_notes string | null Server-side notes added during review.

get_bug_report

Returns a bug report by id. Takes bug_report_id. Visible to the submitter; foreign callers get "not found".

Agent-driven feedback acquisition

Added 2026-05-17.

SpecStep tools may include a feedback_recommendation field on their response when the server detects a known quality issue (Otto produced malformed JSON, review loop stalled, etc.). MCP clients (Claude Code, Codex, Cursor, Windsurf) are expected to act on the field by calling submit_feedback with the supplied prefill and the recommendation_token. SpecStep dedups on the token: calling submit_feedback twice with the same token bumps an occurrence counter on the existing row, not a new row.

Response shape when the field is present (today only on get_generation for Failed generations with a quality-signal category — LlmContract, ReviewBudgetExhausted, RedraftNoProgress, ReviewLoopStalled):

{
  // ... tool's normal output
  "feedback_recommendation": {
    "action": "submit_feedback",
    "reason": "<one-line rationale>",
    "dedup_token": "<sha256 hex>",
    "prefill": {
      "type": "InterviewQuality | PackageQuality | EndToEndRun | ...",
      "title": "<concise summary>",
      "full_report": "<server-composed markdown body>",
      "severity": "Info | Low | Medium | High | Critical",
      "generation_id": "<uuid or null>",
      "interview_id": "<uuid or null>"
    }
  }
}

To file, normalize the enum casing (PascalCase → snake_case for type; lowercase for severity) and call submit_feedback with the prefill verbatim plus recommendation_token.

The field is omitted when the user has disabled this behavior in Settings → Notifications → Agent integrations (default on for new users). Absence-of-field means "do nothing" — never prompt the user to file feedback manually based on this signal.

submit_feedback

Added 2026-05-16. Distinct from submit_bug_report — feedback evaluates quality (was the interview good, is the package coherent, what's the build confidence). Bug reports are for broken behavior.

Submits structured quality feedback. Required: type (interview_quality, package_quality, end_to_end_run, tooling_experience, api_doc_quality, website_quality, launch_readiness, other), title, full_report (markdown). Optional: target GUIDs (interview_id, intake_artifact_id, generation_id, package_id) — required for run-bound types (interview_quality, package_quality, end_to_end_run). Scalar scores: interview_quality_score, package_quality_score, build_confidence_percent (0-100), letter_grade (A-F). Optional template_id + rubric_version link to a template from list_feedback_templates; pass rubric_section_responses (section-id → free-text) + rubric_scores (section-id → 0-100) to fill the rubric.

Additional optional submitter context (added 2026-05-16): estimated_output_quality (≤50 char qualitative label, distinct from the numeric build_confidence_percent), project_type and review_profile (≤50 chars each — denormalize the run's project type and review profile at submission time), transcript_evidence and package_evidence (arrays of quoted snippets, each ≤2000 chars, supporting the findings).

Each entry in structured_findings accepts three richer fields (each ≤2000 chars): evidence (quoted text from the transcript or package supporting the finding), expected_behavior (what the caller expected to happen), suggested_fix (caller's proposed remediation). Mirrors the specialist-reviewer finding shape so feedback findings + reviewer findings can be aggregated.

Typed evidence (added 2026-05-21): each finding also accepts an optional typed_evidence array (up to 20 items) for machine-readable signal you'd otherwise flatten into prose. Each item is { "kind": <string>, "payload_json": <string ≤4000 chars> }. The kind is one of free, http_response, route, console_error, mcp_tool_call, transcript_turn, screenshot, json_payload, and payload_json must be a well-formed JSON document. Required keys depend on the kind: http_response needs a numeric status; route needs a string url; console_error needs a string message; mcp_tool_call needs a string tool; transcript_turn needs a numeric turnIndex; screenshot needs a string path; free and json_payload accept any well-formed JSON. The prose evidence string and typed_evidence can coexist on the same finding. Read responses echo typed_evidence back in the same shape.

Recurrence threading (added 2026-05-17): pass at most one of recurrence_of_feedback_id or recurrence_of_bug_report_id when filing a row because an earlier feedback or bug report was resolved but the issue came back. Both ids cannot be set on the same submission — the system rejects the call.

Agent-driven dedup (added 2026-05-17): pass recommendation_token when filing in response to a server-emitted feedback_recommendation field (see "Agent-driven feedback acquisition" above). The token (an sha256 hex string) is used to dedup against a 30-day window of open auto-filed rows — a dedup hit bumps an occurrence counter on the existing row instead of creating a new one.

Returns id, type, status, created_at. To avoid spending a submit_feedback call on a validation error, dry-run the shape first with validate_feedback.

validate_feedback

Added 2026-05-19. Pre-flight for submit_feedback.

Validates a feedback submission shape without persisting anything. Takes the same input as submit_feedback (the recommendation_token is the only field it drops — dedup is a write-time concern), and the same validation rules apply: template, cap, and section-id violations all fail here exactly as they would at submit time. Returns { valid, errors[] }, where each error is { code, message, param_name } carrying the canonical FEEDBACK_* code (FEEDBACK_TITLE_REQUIRED, FEEDBACK_FULL_REPORT_REQUIRED, FEEDBACK_INVALID, FEEDBACK_TEMPLATE_VERSION_REQUIRED, FEEDBACK_TEMPLATE_UNKNOWN, FEEDBACK_TEMPLATE_TYPE_MISMATCH, FEEDBACK_TEMPLATE_SECTION_UNKNOWN, FEEDBACK_TEMPLATE_SCORE_UNKNOWN — see errors).

Run this first when you're uncertain about template section ids or free-text caps — it catches the error without consuming a submit_feedback call.

amend_feedback

(Added 2026-05-21.) Submitter self-correction. While your feedback row is still Open AND within the amend window (10 minutes of submission), fix free-form content in place: feedback_id (required) plus any of title, summary, full_report, transcript_evidence, package_evidence, tags. Omitted fields are left unchanged. NOT amendable: type, severity, target ids, template_id/rubric_version, and structured_findings. Returns the updated id / title / status / updated_at. Errors (surfaced as the tool error message): the row isn't yours, it has already left Open (FEEDBACK_AMEND_NOT_OPEN), or the window has expired (FEEDBACK_AMEND_WINDOW_EXPIRED). Catch a typo right after submit_feedback while the window is still open.

list_my_feedback

Lists the caller's feedback rows newest-first. Takes optional limit (1-200, default 20). Returns id / type / title / severity / status / linked GUIDs / template id + version / triage notes plus checked_at and reviewed_at so a submitter can tell whether the row has been looked at or reviewed yet.

get_feedback

Returns a feedback row by id. Takes feedback_id. Visible to the submitter; foreign callers get "not found".

The output includes the full record: every field set at submit time (including the 2026-05-16 additions — estimated_output_quality, project_type, review_profile, transcript_evidence, package_evidence, plus the richer per-finding evidence / expected_behavior / suggested_fix) and the server-managed lifecycle stamps (checked_at, reviewed_at).

list_feedback_templates

Lists the available code-defined feedback templates (rubrics) so a client can pick one before submitting. Returns id / version / title / description / section_count.

Seven templates ship in v1, each pairing with a FeedbackType:

Template id Pairs with type Scope
end-to-end-specstep-quality v1.0.0 end_to_end_run One full SpecStep run (interview through generated package) — 13 sections covering interview quality, package coherence, build confidence, letter grade, top blockers, recommended fixes.
interview-quality v1.0.0 interview_quality Otto's performance during a single Interview — 7 sections covering pacing, follow-up quality, coverage breadth, rapport, gaps, recommended follow-ups.
package-buildability v1.0.0 package_quality Whether a generated package is buildable as-is by an AI coder — 8 sections covering coherence, completeness, AI-coder clarity, edge-case coverage, data-shape ambiguities, effort-estimate accuracy, top risks.
api-doc-quality v1.0.0 api_doc_quality The public /api-docs/* surface — 8 sections covering endpoint coverage, completeness, example clarity, error-handling docs, schema clarity, missing sections, recommended improvements.
tooling-experience v1.0.0 tooling_experience The SpecStep tooling surfaces — 9 sections covering MCP ergonomics, CLI / IDE integration, error-message clarity, performance, friction points, recommended improvements.
website-quality v1.0.0 website_quality The public marketing/docs site at specstep.com — 11 sections covering visual polish, copy quality, SEO + sitemap correctness, route correctness, mobile experience, console cleanliness, content sanitization.
launch-readiness v1.0.0 launch_readiness Cross-cutting pre-launch review — 12 sections covering Priority-0 blockers, public content sanitization, trust posture, API + MCP stability, mobile readiness, accessibility, performance, observability, and a final go / no-go recommendation.

get_feedback_template

Returns one template's full content (all sections + prompts + optional score scales). Takes template_id + version.

Webhook subscription tools

Added 2026-05-12.

The five tools below mirror the REST webhook-management surface (/v1/api-keys/{apiKeyId}/webhooks). They let a cookie-authenticated agent register, rotate, smoke-test, and revoke webhook subscriptions on its own API keys. The mutating tools (create_webhook, rotate_webhook_secret, test_webhook) refuse API-key principals by design — a compromised key must not be able to redirect, silently re-sign, or spam-fire event payloads. list_my_webhooks and delete_webhook are safe from any context (read-only and revocation, respectively). Programmatic callers that have explicitly accepted the redirect risk can use the REST endpoints directly — see REST Step 7.5 for the bearer-callable surface.

list_my_webhooks

Added 2026-05-12.

Lists every webhook subscription attached to a caller-owned API key. Takes api_key_id. Returns {api_key_id, webhooks: [{id, url, events, created_at, updated_at, last_delivery_at, last_delivery_status, last_delivery_http_status, needs_rotation}, ...]}. The signing secret is never returned by list — the plaintext is shown only once, at create or rotate time. needs_rotation flags subscriptions whose secret was issued under a deprecated scheme and should be rotated. Foreign and unknown API-key ids surface as "not found." Mirrors REST GET /v1/api-keys/{apiKeyId}/webhooks.

create_webhook

Added 2026-05-12.

Registers a new webhook subscription against a caller-owned API key.

The signing_secret is returned once in this response — store it before the response is discarded; list_my_webhooks will not return it. If lost, rotate via rotate_webhook_secret. The URL must point to an externally routable host: loopback, link-local, and internal addresses are rejected to prevent SpecStep from being used as a proxy to probe networks on the receiver's side. Unknown event types are rejected with the offending names listed.

Refuses API-key principals — a compromised key must not be able to redirect future event payloads to an attacker-controlled URL. Cookie-authenticated humans register webhooks for their own keys via this tool; programmatic callers can use the REST endpoint with explicit risk acceptance. Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks.

Arguments

Name Type Required Description
api_key_id UUID yes The caller-owned API key to attach the subscription to.
url string yes Absolute https:// URL. Loopback / link-local / internal addresses are rejected.
events string[] yes At least one event type — e.g. generation.completed, generation.failed.

Returns

Field Type Description
id UUID The new subscription's id.
api_key_id UUID Echoed.
url / events Echoed.
created_at ISO-8601 Creation timestamp.
signing_secret string Returned once. Use to validate HMAC-SHA256 signatures on delivered payloads.
signing_secret_note string Reminder: this is the only time the plaintext is returned.

rotate_webhook_secret

Added 2026-05-12.

Issues a fresh signing secret for an existing webhook subscription. Takes api_key_id and webhook_id. Returns {id, api_key_id, updated_at, signing_secret, signing_secret_note}. The new plaintext is returned once — update every consumer that validates payloads against this subscription's signature before discarding the response.

The old secret is invalidated immediately on the dispatcher side. In-flight deliveries already signed with the old secret may still arrive at the receiver for a brief window — if you can, bracket rotations with a tolerance window on the receiver (accept either signature for a short period after rotation).

Refuses API-key principals — a compromised key rotating the signing secret could silently lock the legitimate owner out of validating subsequent payloads. Cookie-authenticated humans rotate via this tool; programmatic callers go through REST with explicit risk acceptance. Foreign and unknown ids surface as "not found." Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/rotate-secret.

test_webhook

Added 2026-05-12.

Fires a synthetic webhook.test event against a registered subscription and returns the live delivery outcome. Takes api_key_id and webhook_id. Returns {success, http_status, failure_reason, latency_ms, delivery_id} — lets the owner verify reachability and signature validation without waiting for a real generation event. Useful right after create_webhook or rotate_webhook_secret to confirm the receiver is healthy.

Refuses API-key principals — the dispatcher already enforces externally-routable and DNS-rebinding guards, but a compromised key shouldn't be able to spam owner-initiated POSTs at attacker-controlled URLs. Cookie-authenticated humans test from the management UI or via this tool; programmatic callers go through REST with explicit risk acceptance. Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/test.

delete_webhook

Added 2026-05-12.

Removes a webhook subscription from a caller-owned API key. Takes api_key_id and webhook_id. Returns {api_key_id, webhook_id, deleted: true}. Idempotent — unknown, foreign, and already-removed webhooks surface as "not found" (the subscription is gone either way).

Allowed for both cookie and API-key callers — revocation is always safe. The worst case is an API key disabling its own webhook, which is the legitimate use case for self-managed scriptable infrastructure. Contrast create_webhook, rotate_webhook_secret, and test_webhook, which refuse API-key callers because those operations could redirect or silence event delivery. Mirrors REST DELETE /v1/api-keys/{apiKeyId}/webhooks/{webhookId}.

Webhooks instead of polling

For long-running automations or external systems where polling is awkward, register a webhook subscription on your API key and let SpecStep POST state changes to you. Subscriptions are managed through the REST API — see the step 7.5 walkthrough. The same JSON projection that comes back from get_generation / wait_for_generation is delivered in the webhook body, with HMAC-SHA256 signatures (X-SpecStep-Webhook-Signature) and a delivery id (X-SpecStep-Webhook-Delivery) for dedup. v1 is best-effort with bounded retry; the canonical state remains wait_for_generation.