MCP guide · API docs

If your client speaks MCP natively (Claude Code, Claude Desktop, IDE extensions), skip to Connecting an MCP client — the client handles the protocol for you. The Manual JSON-RPC walkthrough below is for anyone implementing an MCP client by hand or adapting a custom agent runtime.

SpecStep's MCP (Model Context Protocol) server exposes the same generation engine as the REST API, but shaped as discrete tools your AI coding agent can call directly. If your agent is already MCP-capable — Claude Code, Claude Desktop, or a compatible IDE — you can point it at the SpecStep MCP server and let it request documentation without hand-crafting HTTP.

What MCP is

MCP is a protocol for connecting AI agents to external tools and data sources. It uses JSON-RPC messages over HTTP. The agent calls initialize to discover what tools are available, then invokes tools by name with structured arguments. The server returns structured results the agent can read and reason over.

SpecStep implements MCP over a single HTTP endpoint. There is no WebSocket or streaming transport — each tool call is a POST with a JSON-RPC envelope, and the response is returned in the same HTTP response.

Authentication

SpecStep supports two ways to authenticate MCP calls. Browser-based sign-in is the recommended default — your MCP client opens a browser, you sign in once, and the client receives a token without any key management on your part.

The MCP server advertises OAuth 2.1 with PKCE per the MCP spec. Compatible clients — Claude Desktop, Claude.ai, Cursor, Codex, GitHub Copilot, Continue, Cline, and any client that implements the MCP authorization extension — trigger the flow automatically:

The first unauthenticated call to /mcp returns 401 Unauthorized with a WWW-Authenticate header pointing at the protected-resource metadata document.
The client fetches the discovery document at /.well-known/oauth-protected-resource (and /.well-known/oauth-authorization-server) to learn the authorize and token endpoints.
The client opens https://specstep.com/oauth/authorize?… in your browser.
You sign in to SpecStep (via the existing Entra account) and click Allow on the consent screen.
The browser 302s to a loopback URL the MCP client is listening on, carrying a one-time authorization code.
The client exchanges the code at /oauth/token (PKCE-verified) and receives a Bearer oat_… access token valid for 90 days.

You can review and revoke browser-based sign-ins from Settings → API keys → Connected MCP clients.

Dynamic Client Registration (RFC 7591)

Added 2026-05-15.

The discovery document at /.well-known/oauth-authorization-server advertises a registration_endpoint of https://specstep.com/oauth/register. Any MCP client that speaks RFC 7591 — Codex, Claude Desktop, Cursor, Continue, Cline, and any other client following the MCP authorization extension — registers itself on first connect without any pre-shared client_id:

The client POSTs its metadata to /oauth/register:

{
  "client_name": "Codex",
  "redirect_uris": ["http://127.0.0.1:54321/callback"]
}

The server validates each redirect_uri against the RFC 8252 loopback allowlist (http://127.0.0.1:<port>/… or http://localhost:<port>/…), mints a fresh client_id of the shape mcp_<32-hex>, and returns the RFC 7591 §3.2.1 envelope:

{
  "client_id": "mcp_e0f4261b3ad3b5e8dd3ae4c5327a6fec",
  "client_name": "Codex",
  "redirect_uris": ["http://127.0.0.1:54321/callback"],
  "grant_types": ["authorization_code"],
  "response_types": ["code"],
  "token_endpoint_auth_method": "none",
  "client_id_issued_at": 1715800000
}

The client uses that client_id for the subsequent /oauth/authorize + /oauth/token handshake described above.

Registration is anonymous (no API key, no cookie) and rate-limited to 30 registrations per IP per hour. The legacy hardcoded client_id specstep-mcp-generic is still accepted for pre-RFC-7591 clients; new integrations should register their own.

Only the loopback redirect-URI shape is allowed. Public HTTPS redirects, non-HTTP schemes, host-substring tricks, and userinfo-form URIs are rejected with error: "invalid_redirect_uri". Only grant_type=authorization_code, response_type=code, and token_endpoint_auth_method=none (public clients with PKCE) are accepted in the registration request; anything else returns error: "invalid_client_metadata".

API key (for CI / automation)

For headless or server-to-server flows where no browser is available, the existing API-key scheme works:

POST https://specstep.com/mcp
Content-Type: application/json
Authorization: Bearer sf_xxxxxxxxxxxx

Create one at Settings → API keys. The same rate limits apply to both auth schemes — API-key callers have an independent per-key counter; OAuth callers share a single per-user counter across all connected clients. See rate limits for the full scoping rules.

A key's scopes govern which tools it can reach. Most tools below work with any authenticated key, but the session-state and project tools — build sessions, the decision log, the backlog, and project management — are opt-in: a key sees them in tools/list only when it carries the matching scopes (session_state.read, session_state.write, projects.read, projects.write), and a project-scoped key is confined to its one project. See Session state and project tools for the scope reference and how to mint a project-scoped key.

Transport

All MCP traffic goes to:

POST https://specstep.com/mcp
Content-Type: application/json
Authorization: Bearer <oat_…  or  sf_…>

The body is a JSON-RPC 2.0 object. The server returns JSON-RPC results or errors.

A minimal tool call looks like:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "get_generation",
    "arguments": { "generation_id": "gen_01hx..." }
  }
}

Most MCP clients handle the JSON-RPC envelope for you. You configure the server URL; the client either negotiates OAuth automatically or, if you've supplied an API key, attaches the bearer.

Manual JSON-RPC walkthrough

This section shows the exact wire shape for clients written by hand — no MCP library. Every example below is a single POST https://specstep.com/mcp with Authorization: Bearer sf_… (or oat_… from the OAuth flow) and Content-Type: application/json. The server returns the JSON-RPC response in the same HTTP response.

1. `initialize`

The handshake. The client announces its protocol version + capabilities; the server replies with its identity and what it supports.

Request:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-03-26",
    "capabilities": {},
    "clientInfo": { "name": "my-agent", "version": "0.1.0" }
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2025-03-26",
    "capabilities": {
      "tools": { "listChanged": false }
    },
    "serverInfo": {
      "name": "specstep",
      "version": "0.1.0"
    }
  }
}

protocolVersion is the MCP spec version SpecStep speaks; pin your client to it or treat anything matching 2025-* as compatible. capabilities.tools.listChanged: false means the server does not push tool-list updates — refetch tools/list explicitly if you suspect the manifest changed.

2. `notifications/initialized`

Per the MCP spec, the client follows up with a one-way notification (no id field, no expected response). SpecStep treats initialize as the only required handshake and tolerates clients that skip the notification, but well-behaved clients send it:

{ "jsonrpc": "2.0", "method": "notifications/initialized" }

3. `tools/list`

Discover the tool catalog.

Request:

{ "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {} }

Response (truncated — see Available tools below for the complete list):

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [
      {
        "name": "start_interview",
        "description": "Starts a new interview. Returns the interview id and initial agent turn.",
        "inputSchema": {
          "type": "object",
          "properties": {},
          "additionalProperties": false
        }
      },
      {
        "name": "submit_interview_turn",
        "description": "Submits a user turn to an interview. Returns the agent's reply and updated state.",
        "inputSchema": {
          "type": "object",
          "properties": {
            "interview_id": { "type": "string", "format": "uuid" },
            "message":      { "type": "string", "minLength": 1 }
          },
          "required": ["interview_id", "message"],
          "additionalProperties": false
        }
      }
    ]
  }
}

Each entry has name, description, and a JSON Schema inputSchema. The schema is what your agent should hand to its LLM as the tool signature — names and types are authoritative.

4. `tools/call`

Invoke a tool. Tool results are wrapped in MCP content blocks; v1 always emits a single text block carrying the tool's JSON payload as a string. Parse it on the client.

Request:

{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "start_interview",
    "arguments": {}
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "{\"id\":\"01952fcb-cd11-7c3e-9a2e-3b1d8f5e6a04\",\"status\":\"active\",\"transcript\":[{\"role\":\"agent\",\"content\":\"Tell me what you're building...\"}]}"
      }
    ]
  }
}

The result.content[0].text field is a JSON string — parse it again on your side to get the structured payload (interview id, status, transcript, etc.). MCP errors come back as standard JSON-RPC error envelopes (result absent, error: {code, message}); typed application errors (quota exceeded, ownership conflicts, paused-state guards) prefix the message with a stable error code like QUOTA_EXCEEDED: ... or RETRY_STATE_INVALID: ... so clients can branch on it.

Connecting an MCP client

The exact configuration depends on your client. There are two shapes — pick the one that matches whether your client supports OAuth.

OAuth-capable clients (recommended) — Claude Desktop, Claude.ai, Cursor, Codex, GitHub Copilot, Continue, Cline, and any client that implements the MCP authorization extension. Configure the server URL only; the client handles the browser sign-in flow on first connect:

{
  "mcpServers": {
    "specstep": {
      "url": "https://specstep.com/mcp"
    }
  }
}

On the first tool call, the client opens a browser to SpecStep, you sign in and click Allow, and the client receives a 90-day access token. You can revoke it from Settings → API keys → Connected MCP clients.

API-key clients — for clients without OAuth support, or headless / CI flows where no browser is available:

{
  "mcpServers": {
    "specstep": {
      "url": "https://specstep.com/mcp",
      "headers": {
        "Authorization": "Bearer ${SPECSTEP_API_KEY}"
      }
    }
  }
}

Replace ${SPECSTEP_API_KEY} with a key minted from Settings → API keys.

After connecting, call initialize (or the equivalent in your client) to retrieve the tool manifest. The server returns the list of available tools with their argument schemas.

Recommended next steps after connecting

Connecting the server is step one. If an agent is setting SpecStep up for you, it should offer these follow-ups — they're what turn a bare tool connection into a disciplined workflow:

Adopt the session start/end protocols. The session-state kit gives your AI coder a /start-session + /end-session protocol so it records how the project gets built — build sessions, a decision log, a backlog — instead of losing it in a chat log. Worth adopting on any project you'll build with an agent.
Claude Code users — install the plugin (protocols and cost-to-build in one install). claude plugin marketplace add No-Compromise-AI/specstep-plugins && claude plugin install specstep-session@specstep registers both skills and wires the SessionEnd token-usage reporter automatically — there is no separate "usage plugin," it ships in specstep-session. Set SPECSTEP_API_KEY (with the session_state.write scope) so the per-session cost-to-build lands. See Hooking it up.
Already have a project underway? Migrate it. If you're adding SpecStep to an existing codebase with its own handoff notes, decision log, or backlog, don't start from scratch — import them first. See Migrating an existing project.

End-to-end flow via MCP

The same steps as the REST walkthrough, expressed as tool calls. An MCP-capable agent can drive this entire sequence autonomously.

1. Start an interview. Call start_interview — it takes no arguments. It returns the interview_id and the opening agent turn; you describe what you're building in your first turn (step 2), and the interview's detected_type is inferred from it.

2. Submit turns. Call submit_interview_turn with the interview_id and your first message describing the project. Turns are async by default (changed 2026-05-19): the call commits your turn and returns a job_id — poll get_interview_turn_status until status is completed, then read the agent's reply + updated state from the returned snapshot. Continue turn by turn — answering the AI Team's questions about vision, users, requirements, constraints, and architecture — until the interview state is complete. A typical interview takes five to fifteen turns. (Pass mode: "sync" for the legacy inline-reply path — short turns only; it's subject to the ~60s gateway ceiling and is scheduled for removal.)

3. Start a generation. Call start_generation with the intake_id (the intake artifact identifier produced by completing the interview) and your chosen review profile. Store the returned generation_id. The generation is now Queued.

4. Poll for completion. Call wait_for_generation with the generation_id and respect the returned next_check_seconds hint between calls. The state will move from Queued to Drafting / Reviewing / FreshEyes as the generation runs, then to a terminal Complete / Failed / Cancelled. wait_for_generation is preferred over get_generation because it inlines the polling cadence + the pending-clarifications + the download URL, cutting most flows to a single tool call.

4a. Handle a paused clarification. If wait_for_generation returns state: "PausedAwaitingClarification", the response already includes pending_clarifications (get_pending_clarifications would return the same payload, so no extra round-trip needed). Surface the question text to your user, gather their answers, then call answer_clarifications with {question, answer} pairs that match the question text verbatim. The generation resumes on the next dispatcher tick. Skip this step entirely when no clarification fires.

5. Retrieve the package. When wait_for_generation reports state: "Complete", the response carries a short-lived package_url you can download from directly. If you want richer metadata, call get_package with the package_id; for history, list_packages.

6. Deliver (optional). Package delivery — committing to a GitHub repository and opening a pull request — is handled via the REST API (POST /v1/packages/{id}/deliver). MCP tools do not cover delivery in this version.

Session-state kit — disciplined build sessions

Beyond generating packages, SpecStep's MCP server doubles as a session-state backend: a place for your AI coding agent to record how a project gets built. Three aggregates — build sessions, a decision log, and a backlog — plus per-session token-usage rollups, all reachable with your own key and confined to your own data (see Session state and project tools for the scopes). The tools are inert without a protocol that tells an agent when to call them; the session-state kit supplies that protocol as two skills your agent runs at the start and end of each work session.

What the skills do

start-session — run it before any material work. It reads your project's context files, queries the server for the active build session plus the recent decision-log entries and backlog counts to recover where the last session left off, drift-checks that recorded state against your main branch, then starts (or resumes) one build session that everything the session ships links back to (start_build_session is idempotent on machine + branch + actor, so re-running it resumes rather than duplicates). If you're picking up a backlog item, it moves that item to in progress and confirms the resume position with you.
end-session — run it as the session closes. It appends a decision-log entry for each material decision (append_decision_log), files a backlog item for anything deferred (file_backlog_item), resolves or carries forward the item you picked up, and closes the build session (end_build_session) with a history entry plus the PRs and commits it shipped — written before the code lands, so the record stays honest.

The discipline is the point: no silent state changes, decisions logged as they're made, deferred work captured the moment it's deferred, drift surfaced rather than hidden, and one build session per work session so the history links up. Query it back any time with query_decisions, query_backlog, and get_build_session_cross_aggregate — the last returns a build session together with every decision and backlog item it touched, plus its token-usage rollup, in a single call.

Cost to build

When a session ends, a SessionEnd hook bundled with the kit sums the session's token usage — the main agent transcript plus any sub-agent transcripts — and reports it to that build session. Over a project's life, those per-session totals add up to a queryable cost to build, surfaced in the get_build_session_cross_aggregate rollup. Tokens only; no dollar amounts leave your machine. The reporter is idempotent — a re-fired report overwrites rather than double-counts — and fail-open: a reporting failure logs and exits without ever blocking your session from ending.

Hooking it up

Connect the MCP server. Point your agent at SpecStep's MCP server — the same setup as Connecting an MCP client.
Carry the scopes. Mint a key with session_state.write (add projects.write if the agent should create projects) — see Session state and project tools. If you don't have a project to write into yet, create one with create_project, or mark a default with set_default_project.
Set the API key. Put SPECSTEP_API_KEY in the agent's environment so the cost-to-build reporter can authenticate; it needs the session_state.write scope. Set SPECSTEP_API_BASE only if you point at a non-default host. Without SPECSTEP_API_KEY the skills still run (they authenticate over your configured MCP connection) — only the automatic usage reporting no-ops.
Add the skills. For Claude Code, install the kit as a plugin from SpecStep's public marketplace:
```
claude plugin marketplace add No-Compromise-AI/specstep-plugins
claude plugin install specstep-session@specstep
```
That registers the /start-session and /end-session skills and wires the SessionEnd usage reporter automatically — nothing else to configure. For agents without plugin support, wire the pieces by hand instead: the two skills as agent skills and the reporter as a SessionEnd hook. If the hook can't run in your setup, call record_build_session_usage yourself at the end of a session to record the totals.

The kit drives only the self-service session-state and project tools, so a standard authenticated key runs the whole protocol — nothing in it needs elevated scopes.

Backfilling past sessions

Have build sessions whose AI-coder runs finished before you wired the reporter? You can record their usage after the fact. POST /v1/build-sessions/{id}/usage (and the record_build_session_usage tool) accept writes to Active or closed sessions, and the upsert is idempotent on (build_session, claude_session) — so backfilling is always safe to re-run.

The reporter has a one-step mode for this. Point it at a past Claude Code transcript (~/.claude/projects/<encoded-cwd>/<session-id>.jsonl) and the build session it belongs to:

# preview the computed payload without sending
node run-reporter.js --backfill <transcript.jsonl> --build-session <id> --dry-run

# record it (needs SPECSTEP_API_KEY with the session_state.write scope)
node run-reporter.js --backfill <transcript.jsonl> --build-session <id>

run-reporter.js is the kit's cross-platform launcher — it resolves your platform's Python (py/python on Windows, python3 on macOS/Linux); you can also call session-end-usage-reporter.py directly with your own Python.

The claude_session_id is taken from the transcript filename — the same key the live hook uses — so a backfill writes the same row a live report would, and a later live run overwrites it instead of duplicating. Run it once per transcript to attribute several past AI-coder sessions to one build session. Backfill needs the transcript still on disk: the token counts are read from it and can't be reconstructed once it's gone. Prefer to do it by hand? Call record_build_session_usage (or POST the endpoint) with the token counts directly — same idempotency, same closed-sessions-allowed rule, same confinement to your own projects.

Session-state & project tools

Beyond the generation pipeline, SpecStep's MCP server exposes a self-service family of tools for recording how a project gets built — the surface the session-state kit drives. Unlike the zero-setup generation tools, these are gated on the session_state.* / projects.* scopes: mint a key carrying them (see Authentication) and every tool below appears in your tools/list, confined to your own projects by the tenant filter. These are self-service — any authenticated user can use them for their own data; no elevated access is required.

Live schemas are authoritative. Once your key carries the scope, your MCP client's tools/list returns each tool below with its full, current argument schema — that's the contract to code against. The map here is for discovery; call tools/list for exact arguments.

Projects

The top-level scoping primitive — everything else files into a project. Scopes: projects.read / projects.write.

create_project — create a project.
list_projects — list your projects.
get_project — fetch one project.
update_project — rename / edit a project.
set_default_project — mark the project new records file into by default.
archive_project / unarchive_project — archive or restore a project.
reassign_record_to_project — move a misfiled backlog item / build session / decision-log entry to another of your projects.

Build sessions

One work unit per developer per branch — the spine everything a session ships links back to. Scopes: session_state.read / session_state.write.

start_build_session — open (or idempotently resume) a session.
end_build_session — close a session with a history entry.
session_ping — liveness heartbeat for long sessions.
update_session_current_state / prepend_session_current_state_lead — maintain the rolling "current state" note.
link_session_continuation — link a session that continues another (e.g. across machines).
record_build_session_usage — record a session's token usage (the cost-to-build rollup).
get_build_session / list_build_sessions / query_build_sessions — read / list / full-text-search sessions.

Decision log

An append-only record of material decisions. Scopes: session_state.read / session_state.write.

append_decision_log — record a decision.
correct_decision_log — supersede an earlier entry (append-only correction).
amend_decision_log_entry_source — attach or fix the source PR / commit on an entry.
get_decision_log_entry / get_decision_log_entry_summary — fetch one entry (full / summary).
list_decision_log — list entries by date range.
query_decisions — full-text search decisions.
count_decisions_by_period / summarize_decisions_by_author — aggregate views.

Backlog

Deferred work, captured the moment it's deferred. Scopes: session_state.read / session_state.write.

file_backlog_item — file a deferred item.
triage_backlog_item — move status (Open → InProgress → Resolved / Dismissed).
assign_backlog_item — assign an owner.
reprioritize_backlog_item — change an item's priority any time (no filer / 24 h-window limit — the hygiene path).
acknowledge_backlog_item_staleness — reset the staleness clock without changing status.
amend_backlog_item / amend_backlog_closing_notes — edit the body / closing notes.
get_backlog_item / list_backlog_items / query_backlog — read / filter / full-text-search.
count_backlog_items_by_status / list_stale_backlog_items — aggregate + staleness views.

Imports

Bring an existing project's history in — see Migrating an existing project. Scopes: session_state.read / session_state.write.

import_session_state_from_markdown — parse handoff / decision-log / backlog markdown into records (dry-run by default).
list_session_state_import_parsers — list the parsers available per aggregate type.
get_session_state_import / list_session_state_imports — inspect past import runs.

Cross-aggregate

Joined reads across the families. Scope: session_state.read.

get_build_session_cross_aggregate — a build session plus every decision + backlog item it touched + its token-usage rollup, in one call.
get_session_state_import_rows — the rows a given import produced.

Lessons & rules

Capture recurring development patterns and turn the human-promoted ones into just-in-time guidance. Per-project customer data — you promote your own lessons. Scopes: session_state.lessons.* / session_state.rules.*.

file_lesson — capture a recurring pattern (lands Documented).
append_lesson_observation — add a sighting to a lesson.
update_lesson_status — move a lesson through its lifecycle, including the human-gated promotion to Enforced (for your own lessons).
delete_lesson — hard-delete a lesson filed by mistake, freeing its per-project slug so you can re-file cleanly (refused once a lesson is Enforced — archive those instead).
query_lessons / get_lesson — search / fetch lessons.
list_lesson_candidates — auto-detected pattern proposals awaiting triage.
promote_lesson_candidate / reject_lesson_candidate / supersede_lesson_candidate — triage candidates.
query_applicable_rules — retrieve the rules (derived from your Enforced lessons) that apply to a specific change.
list_rules / get_rule — browse / fetch rules.
reenrich_rule — requeue a Failed rule's trigger extraction.

Migrating an existing project

Adopting SpecStep on a project that already has history — handoff notes, a decision log, a backlog, or a folder of spec/design docs — doesn't mean starting over. Bring the existing record in first, then run the session-state protocols forward from there. All of the tools below use only self-service scopes (session_state.*), so your standard key is enough.

1. Pick a target project. Import lands in your default project unless you pass project_id. Create one with create_project (or set_default_project) if you don't have one yet.

2. Import existing session-state markdown — handoff / decision-log / backlog notes. import_session_state_from_markdown parses a markdown file into SessionState rows:

aggregate_type — DecisionLog, Backlog, or BuildSession.
parser_strategy_id — the parser to apply. Call list_session_state_import_parsers first to see what's available for each aggregate type (formats differ — a decision log written as ## <date> — <title> sections parses differently from a bulleted backlog).
dry_run — defaults to true. The first call is a preview: it returns the row counts + parsed titles it would create, so you can eyeball the mapping before committing. Re-call with dry_run: false to write.
Re-imports are idempotent — a per-entry fingerprint dedups, so fixing the parser choice and re-running won't double-file. Inspect a past run with get_session_state_import / list_session_state_imports.

Inline markdown_content is capped at 1 MB (it travels through the agent's conversation context). For a larger corpus, upload via REST instead — POST /v1/session-state-imports/upload (multipart) or the in-app Session state → Imports → Upload form — which bypasses the tool-call channel entirely.

3. Migrate existing spec/design docs into a package (optional). If the project already has written specs or design docs you want SpecStep to treat as a delivered package, that's a different path from session-state import: preview_doc_migration classifies a .zip of docs into a proposed source→canonical mapping (review the low-confidence guesses + conflicts it flags), then commit_doc_migration normalizes them into a migrated package linked to your project. The in-app equivalent is the Migrate existing docs card on the project page.

4. Then run the protocols forward. With the history imported, /start-session recovers it like any other resume — query_decisions / query_backlog / get_build_session_cross_aggregate read back the migrated record alongside everything you ship from here.

Recommended MCP workflows

Twelve short recipes covering the common reasons an agent calls SpecStep. Each names the tools in order — argument schemas live in the reference catalog below.

1. Create a new package from scratch

start_interview — opens the interview (no arguments), returns interview_id.
submit_interview_turn — async by default: returns a job_id; poll get_interview_turn_status until completed. Submit user turns until the interview reports complete.
validate_generation_request — recommended pre-flight; returns {is_valid, blocking_errors, warnings} without enqueueing.
start_generation — kick off the run. Returns generation_id.
wait_for_generation — block on terminal state with built-in polling cadence.
get_latest_package_for_generation — resolve the produced package.
list_package_files / get_package_file — read individual files on demand.

2. Inspect a completed package

list_packages (account-wide) or list_packages_for_generation (one generation).
get_package — the package record + a fresh SAS download URL.
list_package_files — the zip's central-directory listing.
get_package_file — read individual files without downloading the zip.
search_package — full-text search inside a single package.

3. Compare two or more generations

compare_packages — high-level identity verdict + per-package build / quality confidence scores.
diff_package_files — line-level unified diff across same-named files.
get_generation_quality_report — structured reliability / accessibility / cost / risk findings per generation.
get_security_findings — structured security-expert findings per generation.

4. Apply a small change to an existing package

estimate_change_request_cost — check the rolling-30-day median cost before paying for the call.
request_change — file the addendum (one LLM call; cheaper than a full re-gen).
list_change_requests — the addendum history for a package.
get_change_request — one addendum record + SAS download URL for the zip.
list_change_request_files + get_change_request_file — read the addendum's five markdown files without unzipping.

5. Gate automation on quality and security

wait_for_generation until state == "Complete".
get_security_findings — branch on max_severity (Critical / Major / Minor / Info / None).
get_generation_quality_report — reliability / accessibility / cost / risk severities for the same generation.
Fail or warn based on the severity thresholds your gate enforces.

6. Attach external reference docs

attach_external_folder — returns a one-time browser URL the user opens to complete OAuth + folder pick.
User opens the URL in their browser; SpecStep handles provider OAuth and first sync server-side.
get_attach_external_folder_session — poll until status == "Completed" (or a terminal failure).
Continue the interview or generation flow; the folder's files are now available as reference documents.

7. Use webhooks instead of polling

create_webhook — subscribe a target URL to one or more event types. The signing secret is returned once.
test_webhook — fire a synthetic webhook.test event to verify the target is reachable.
rotate_webhook_secret — issue a fresh signing secret and invalidate the old one.
delete_webhook — retire the subscription.

wait_for_generation remains the canonical polling fallback when the webhook target is unavailable.

8. Inspect or resume an in-flight generation

list_generations filtered by state — find your in-flight runs.
get_generation — the full aggregate including progress_percent and cost-forecast fields.
get_events — chronological telemetry (state transitions, agent activity).
If state == "PausedAwaitingClarification": get_pending_clarifications then answer_clarifications — the generation resumes on the next dispatcher tick.
wait_for_generation — block on the terminal state.

9. Retry or cancel a failed generation

get_generation — read failure_category to decide whether retry is appropriate.
retry_generation to re-fire from the original kickoff envelope — see errors §409 for the four typed retry-rejection codes (RETRY_STATE_INVALID, RETRY_RESEARCHER_CHILD, RETRY_ENVELOPE_UNAVAILABLE, RETRY_OWNER_MISMATCH).
OR cancel_generation if abandoning the run.
wait_for_generation after a successful retry.

10. Soft-delete and restore

Asymmetric by historical convention — Package's delete/restore are folded into update_package's flags; Generation has dedicated tools.

Packages. update_package with delete: true — soft-deletes the row. Later: update_package with restore: true.
Generations. delete_generation — soft-deletes. Later: restore_generation.

Soft-deleted rows drop out of the default list queries; they're still recoverable until the 30-day retention window auto-purges them.

11. File a bug report or quality feedback

Pick the type that fits.

Broken behavior (404, wrong output, crash):

submit_bug_report — include diagnostic context (URL, generation id, error excerpt). Returns the bug_report_id.
list_my_bug_reports — your filed reports and their current state.
get_bug_report — one record, including any review notes and state transitions.

Quality evaluation (was the interview good, is the package coherent, what's the build confidence):

list_feedback_templates — discover the available rubrics. Seven templates ship in v1: end-to-end-specstep-quality (whole-run), interview-quality (Otto behavior only), package-buildability (deliverable only), api-doc-quality (the /api-docs/* surface), tooling-experience (MCP / CLI / IDE ergonomics), website-quality (the public marketing/docs site), and launch-readiness (cross-cutting pre-launch review). Pick the one whose scope matches the feedback — narrower rubrics keep the signal cleaner than the all-in-one.
get_feedback_template — fetch the full sections for the chosen template to see which section ids to fill.
validate_feedback — (Added 2026-05-19.) Dry-run the submission shape before committing. Returns { valid, errors[] } where each error carries code (the canonical FEEDBACK_* error code), message, and param_name. Same input as submit_feedback minus the recommendation_token. Use this when you're guessing at the rubric's section ids or the cap on a free-text field — better to catch the mistake without burning a submit_feedback call.
submit_feedback — include type, title, full_report, the linked GUIDs (interview_id / generation_id / package_id), and rubric_section_responses + rubric_scores if you used a template.
list_my_feedback — your filed feedback and its current state.
get_feedback — one record, including any review notes and state transitions.

12. Capability and subscription discovery

get_capabilities — schema versions, accepted enum values (review_profile, project_type, mirror_selection). Call BEFORE start_generation so you can avoid hardcoding magic strings that change on deploy.
get_subscription — the caller's tier (Free / Pro / Team) + credit-quota snapshot. Review depth is economic — any tier runs Fast/Normal/Extensive it can afford (only Researcher is tier-gated) — so branch on remaining credits, not on the tier name, before kicking off generations.

Tool selection guide

A quick mapping from common agent intent to the best first tool. When in doubt, start here, then read that tool's reference entry below for argument detail.

Intent	Start with
"I need to know what values are valid"	`get_capabilities`
"I want to know if a generation will succeed"	`validate_generation_request`
"How many credits do I have / does my tier grant `Researcher`?"	`get_subscription`
"I need the latest package for a generation"	`get_latest_package_for_generation`
"I need one file from a package"	`list_package_files` → `get_package_file`
"I need to search across all my packages"	`search_my_packages`
"I need to inspect an addendum"	`list_change_requests` → `get_change_request` → `get_change_request_file`
"I need to compare packages"	`compare_packages` + `diff_package_files`
"I need review findings as data, not prose"	`get_security_findings` + `get_generation_quality_report`
"My generation is paused — what's the question?"	`get_pending_clarifications` → `answer_clarifications`
"My generation failed — why?"	`get_generation` (read `failure_category`) → `get_events`
"I want to retry a Failed generation"	`retry_generation`
"I need account-wide cost over a period"	`get_usage`
"I want to rate or evaluate a finished run"	`list_feedback_templates` → `get_feedback_template` → `submit_feedback`
"I want to know if a feedback submission will be accepted"	`validate_feedback` (dry-run) → `submit_feedback`
"I want to file a bug, not rate a run"	`submit_bug_report` (broken behavior; use `submit_feedback` for quality evaluation)

Available tools

These are the SpecStep MCP tools available to standard authenticated callers. The nine categories below group tools by capability area.

Your tools/list is scoped to your key's permissions. A standard customer key sees exactly the catalog documented here. Operator/admin tools — user administration, support/feedback/bug-report triage queues, billing and profit summaries, alert and security-finding management — carry a required permission and are filtered out of both tools/list and tools/call for any key that doesn't hold it: a non-operator key can't even tell they exist (calling one by name returns the same Unknown tool error as a typo). So if you're an operator and notice tools in your manifest that aren't listed below, that's expected — they're gated to your account, not part of the customer surface. The session-state and project tools (build sessions, decision log, backlog, project management, lessons & rules) are a related but self-service case: opt in by minting a key that carries the matching scopes (session_state.* / projects.*) — see Authentication. They're documented in full under Session-state & project tools, not in the reference catalog below.

Interview tools

`start_interview`

Creates a new interview. Takes no arguments — the opening agent turn arrives in the response's transcript. Describe what you're building in your first submit_interview_turn call (project type, vision, constraints); the interview's detected_type is inferred from that first turn.

`submit_interview_turn`

Submits a turn to an existing interview. Default mode is async (changed 2026-05-19): the call commits your user turn + enqueues a background job and returns a job_id you poll via get_interview_turn_status (or subscribe to the InterviewTurnJobStatusChanged SignalR push). Legacy inline-reply behavior is available via mode: "sync" but is subject to the ~60s Front Door ceiling and is scheduled for removal after one release cycle.

Arguments

Name	Type	Required	Description
`interview_id`	UUID	yes	The interview to append the turn to.
`message`	string	yes	The user's turn. Empty / whitespace strings are rejected; cap is 16,384 characters.
`client_request_id`	string	no	Optional idempotency token (1..128 chars of `[A-Za-z0-9._:-]`). A retry with the same value returns the cached result of the first call instead of re-invoking the LLM. Recommended for any caller that might retry on network failure.
`mode`	string	no	Default `"async"` (changed 2026-05-19): returns a `job_id` you poll via `get_interview_turn_status`. Pass `"sync"` to opt into the legacy inline-reply path (subject to the ~60s Front Door ceiling — may 504 on long interviews; scheduled for removal).

Returns (async mode, default) — either:

{status: "queued", job_id, interview_id, submission_id?, user_turn_committed: true, snapshot: null} — your user turn committed; poll get_interview_turn_status(job_id) for the agent reply.
{status: "cached_replay", job_id: null, interview_id, submission_id, user_turn_committed: true, snapshot: <interview snapshot>} — you supplied a client_request_id whose original call already completed; here's the cached result.

Returns (sync mode, opt-in) — full interview snapshot: {id, status, detected_type, started_at, last_activity_at, completed_at, intake_artifact_id, transcript: [{role, content, occurred_at}, …], started_generation_id?, auto_start_failure?}. Read the last agent-role entry of transcript for the agent's reply. When the interview just transitioned to status: "complete", the response also carries the auto-handoff fields below.

Completion auto-handoff (added 2026-05-17). When the agent signals completion on a turn (the interview transitions to complete and an intake_artifact_id is produced), SpecStep auto-starts a generation with sensible defaults (review_profile: "Normal", has_ui derived from the detected project type) and surfaces the result on the same response. Every package ships the full set of AI-coder instruction files (CLAUDE.md, AGENTS.md, .cursorrules, .github/copilot-instructions.md):

started_generation_id — non-null on success; the generation id you can poll via wait_for_generation / get_generation.
auto_start_failure: {code, message} — non-null when auto-start failed (quota exceeded, validation error, transient provider failure, etc.). The interview turn still succeeded; call start_generation manually with the intake_artifact_id if you want to retry the kickoff with custom settings.

Both fields stay null when the turn didn't trigger completion. Auto-handoff is restricted to user-actor interviews; API-key actors receive auto_start_failure.code: "AUTO_START_NOT_SUPPORTED_FOR_ACTOR_TYPE" and call start_generation themselves.

The auto-handoff fields land on the snapshot returned via get_interview_turn_status when the async job's completion produced an intake artifact.

Errors — when an idempotency replay finds the original is still processing, you get INTERVIEW_TURN_IN_FLIGHT with data: {retryable: true, retry_after_seconds: 5, turn_committed: false, ...}. When the original failed, you get the cached error code with data: {retryable, turn_committed: false, original_error_code, replayed_from_cache: true, ...}. See errors.

`get_interview_turn_status`

Status poll for an async submit_interview_turn job. Returns the job's current state plus (when completed) the canonical interview snapshot, or (when failed) structured error fields.

Arguments

Name	Type	Required	Description
`job_id`	UUID	yes	The `job_id` returned by an async `submit_interview_turn` call.

Returns — {status, job_id, interview_id, snapshot?, error_code?, error_message?, is_retryable?, created_at, completed_at?} where status is one of queued, running, completed, failed. When completed, snapshot carries the full interview state in the same shape sync submit_interview_turn returns. When failed, the error_code is one of the standard interview-turn codes (INTERVIEW_TURN_TIMEOUT, INTERVIEW_TURN_TRANSPORT_ERROR, INTERVIEW_TURN_STUCK_RUNNING, INTERVIEW_TURN_INTERNAL_ERROR, …) and is_retryable tells you whether re-submitting with the same client_request_id is safe.

Foreign job ids return a "not found" error (same info-hiding convention as get_interview).

`cancel_interview_turn`

Added 2026-05-18.

Cancels a background submit_interview_turn(mode: 'async') job by id. Useful when the user's submitted turn was wrong, when an LLM call is dragging on, or when the caller wants to abandon a half-finished turn rather than wait for it (or its stuck-job timeout). Queued jobs cancel cleanly; running jobs cancel best-effort — the job's terminal status will be cancelled, but the agent reply MAY still appear in the interview transcript if a mid-pipeline SaveChanges committed before the cancel landed. Idempotent on already-Cancelled jobs.

Arguments

Name	Type	Required	Description
`job_id`	UUID	yes	The `job_id` returned by an async `submit_interview_turn` call.

Returns — {status, job_id, interview_id, created_at, completed_at?} where status is cancelled on the happy path. Mirrors the shape get_interview_turn_status returns (no snapshot field — the work was abandoned).

Returns a INTERVIEW_TURN_NOT_CANCELLABLE conflict when the job is already completed or failed (the work landed; the result is at get_interview_turn_status). Foreign job ids return a "not found" error (same info-hiding convention as get_interview_turn_status).

`list_interviews`

Lists the caller's interviews, newest first. Empty conversations (< 2 turns) are filtered out so abandoned-at-first-contact rows don't clutter the list.

Arguments

Name	Type	Required	Description
`status`	string	no	Comma-separated. One or more of `active`, `paused`, `abandoned`, `complete`, `awaiting_clarification`.
`limit`	int	no	Default `20`, max `100`.

Returns — {interviews: [...]} where each item has:

Field	Type	Description
`id`	UUID	Interview id.
`status`	string	One of the lowercase status values above.
`detected_type`	string \| null	Project type inferred from the first user turn.
`display_title`	string	Short human-readable label for the interview.
`turn_count`	int	Total turns recorded so far.
`started_at`	ISO-8601	When the interview was created.
`last_activity_at`	ISO-8601	Timestamp of the most recent turn or state change.

`get_interview`

Returns the full state and transcript of an interview by id. Takes interview_id. Same auth boundary as REST: foreign callers get "not found" rather than a 403, so probing foreign ids is impossible.

The response carries a transcript_size introspection block (added in v0.18, 2026-05-22) — byte-identical to the REST shape — so MCP clients can observe how full a transcript is before queuing the next turn: { chars, tokens_estimate, max_chars, max_tokens, percent_used }. chars sums the UTF-16 length of every user + agent turn (system prompts and reference documents are excluded); tokens_estimate is chars / 4 (conservative upper bound). max_chars and max_tokens report the current platform ceiling but are not enforced in v0.18 — a later release will reject submit-turn calls that would exceed them with a structured error envelope.

`delete_interview`

Soft-deletes an interview by id. Takes interview_id. Allowed in any status (Active, Paused, Complete, Abandoned, AwaitingClarification, ClarificationResolved) — soft-delete is a "remove from my workspace" affordance, not a state-machine transition. Idempotent on already-deleted rows. The interview row stays in the database for audit + recovery; the conversation drops out of list_interviews and the workspace UI. Foreign callers get "not found" so foreign ids can't be probed. Returns {interview_id, action: "deleted"}. Sister tool: restore_interview.

`restore_interview`

Restores a soft-deleted interview by id. Takes interview_id. Idempotent on already-live rows. No state guard. Sister to delete_interview. Returns {interview_id, action: "restored"}.

`list_intake_artifacts`

Added 2026-05-08.

Lists the caller's intake artifacts (the structured output of a completed Interview, the sole input to start_generation). Sibling-shape to list_interviews; agents pick a ready-to-generate artifact without filtering interview status inline. Optional status filter ("ready" is the only meaningful value today; null/blank = same as "ready"; unknown labels return an empty list). Optional limit (default 50, max 200) and offset for pagination. Returns {artifacts: [{id, interview_id, project_name, schema_version, completed_at}, ...]}, newest first. Mirrors REST GET /v1/intake-artifacts. Use the returned id as the intake_id argument to start_generation.

`get_intake_artifact`

Added 2026-05-12.

Fetches a single intake artifact by id. Takes intake_artifact_id. Returns {id, interview_id, content_warning, payload_content_type, payload, project_attributes} — the full structured intake JSON the orchestrator feeds into start_generation, plus the project-attribute flags set by the post-interview attribute-detection pass (has_ui, has_persisted_data, has_ai_features, has_backend, requires_i18n, requires_compliance, compliance_frameworks).

The payload is user-authored JSON; it ships inside an untrusted_text envelope with a content_warning so MCP clients don't treat the strings as agent instructions. Owner-scoped — foreign and unknown ids surface as "not found" rather than 403, so probing is impossible. Use this when an agent wants to inspect or debug what an interview produced before calling start_generation, or to investigate a "why did this generation produce that" question after the fact.

External-connector tools

Added 2026-05-15.

MCP-driven flow for attaching a OneDrive / SharePoint / Google Drive folder to one of the caller's interviews. The MCP client itself is a CLI — it can't render a folder picker — so the kickoff tool returns a one-time launch URL the user opens in their default browser. The browser handles the existing provider-pick + OAuth + folder-pick + first-sync flow; the MCP client polls a sibling tool for terminal state. Same pattern as start_generation + get_generation. The synced files materialize as reference documents on the interview, identical to what the Web UI's "Connect a folder" affordance produces.

`attach_external_folder`

Creates an attach session and returns a launch URL. The MCP client opens the URL (or prints it for the user) and then polls get_attach_external_folder_session until the session reaches a terminal state.

Arguments

Name	Type	Required	Description
`interview_id`	UUID	yes	The interview the resulting connector's files will sync into. Caller must own the interview; foreign ids surface as "not found" rather than `403`.

Returns

Field	Type	Description
`attach_session_id`	UUID	Session id. Pass to `get_attach_external_folder_session` to poll for status.
`launch_url`	string	Absolute URL the user opens in their browser (e.g., `https://specstep.com/external-connectors/attach/<id>`).
`status`	string	Initial status — always `awaiting_provider_pick` on a fresh kickoff.
`expires_at`	ISO-8601	UTC timestamp the session expires (30 minutes after creation).
`message`	string	Human-readable prompt the MCP client surfaces to the user.

`get_attach_external_folder_session`

Polls the state of an attach session. Same auth boundary as the kickoff tool — cross-user reads surface as "not found".

Arguments

Name	Type	Required	Description
`attach_session_id`	UUID	yes	The session id returned by `attach_external_folder`.

Returns

Field	Type	Description
`status`	string	One of `awaiting_provider_pick`, `awaiting_oauth`, `awaiting_folder_pick`, `syncing`, `completed`, `expired`, `cancelled`, `failed`.
`connector_id`	UUID \| null	Populated when `status = completed`. The new (or reused) `ExternalConnector` id.
`provider`	string \| null	Populated once the user picks a provider in the browser. One of `onedrive`, `sharepoint`, `googledrive`, `dropbox`.
`folder_name`	string \| null	Populated on or after commit. The folder the user selected.
`files_synced`	int \| null	Populated when `status = completed`. Count of files materialized as reference documents on the interview.
`error_code`	string \| null	Populated when `status = failed` (e.g., `commit_failed`, `authorize_failed`).
`error_description`	string \| null	Human description of the failure when `status = failed`.
`expires_at`	ISO-8601 \| null	UTC timestamp the session was set to expire.

Terminal states are completed, failed, expired, and cancelled. Unknown / expired session ids return a synthetic {"status": "expired"} response — the client can re-run attach_external_folder to start over.

Generation tools

`start_generation`

Starts a generation from a completed interview's intake. Takes the intake_id and (optionally) the review profile, project type, and version pins. Returns the generation id and initial state Queued. Subject to the same 5-kickoffs-per-minute rate limit as POST /v1/generations.

Many callers won't need to call this directly — when the agent signals completion on a submit_interview_turn call, SpecStep auto-starts a generation with sensible defaults and surfaces the new generation id as started_generation_id on the response snapshot. Call start_generation explicitly when you want non-default settings (custom review_profile, project_type, etc.) or when the auto-start surfaced an auto_start_failure you need to retry past.

Arguments

Name	Type	Required	Description
`intake_id`	UUID	yes	The intake artifact produced by completing an interview.
`review_profile`	string	no	One of `Fast`, `Normal`, `Extensive`, `Researcher`. Defaults to `Normal`.
`project_type`	string	no	One of `WebApp`, `MobileApp`, `MobileGame`, `DesktopApp`, `BrowserExtension`, `AiAgent`, `AiTool`. Defaults to `WebApp`.
`has_ui`	bool	no	Whether the project has a user interface. Defaults to `false`.
`schema_version`	string	no	Pins the manifest schema version. Defaults to `1.0.0`.
`rubric_version`	string	no	Pins the review rubric version. Defaults to `1.0.0`.
`quality_rubric_version`	string	no	Pins the quality rubric version. Defaults to `quality-1.0`.
`mirror_selection`	string	no	Accepted for back-compat (`None`, `ClaudeMd`, `CursorRules`, `Copilot`, `All`) but no longer narrows the output — every package ships the full AI-coder mirror set (`CLAUDE.md`, `AGENTS.md`, `.cursorrules`, `.github/copilot-instructions.md`).

Returns

Field	Type	Description
`id`	UUID	The new generation's id.
`state`	string	Initial state, normally `Queued`.
`download_url`	string \| null	Populated only if the package is synchronously ready (rare).
`package_id`	UUID \| null	Populated only if the package is synchronously ready.

`get_generation`

Breaking change in v0.9.5 (2026-05-06). This tool was previously called get_status. Callers using the old name must switch — the dispatcher rejects get_status with a MethodNotFound-style error.

Returns the current state of a generation. Takes a generation ID. Returns the state (one of Queued, Drafting, SpecialistReview, Reviewing, FreshEyes, RiskReview, SecurityReview, Assembling, Refining, Delivering, Paused, PausedAwaitingClarification, Complete, Failed, Cancelled, AddendumRunning), the current round, the running cost, the computed progress_percent, and the typed failure_category when the generation failed.

When the historical sample is large enough, the response also carries estimated_total_usd plus estimated_total_p25_usd / estimated_total_p75_usd / estimated_total_sample_size — the same forecast envelope the Generation Details page renders.

The response also carries project_name, description, kind ("specification"), and kind_label so the agent knows what the generation is about and can disambiguate the deliverable from runnable code.

Poll this until state is terminal — or use wait_for_generation instead, which returns a polling-cadence hint.

Arguments

Name	Type	Required	Description
`generation_id`	UUID	yes	The generation to inspect.

Returns (shared with wait_for_generation)

Field	Type	Description
`id` / `generation_id`	UUID	The generation's id.
`state`	string	One of `Queued`, `Drafting`, `SpecialistReview`, `Reviewing`, `FreshEyes`, `RiskReview`, `SecurityReview`, `Assembling`, `Refining`, `Delivering`, `Paused`, `PausedAwaitingClarification`, `Complete`, `Failed`, `Cancelled`, `AddendumRunning`.
`current_round`	int	Current review-loop round number.
`progress_percent`	int	Computed 0–100 progress signal.
`running_cost_usd`	decimal	Live cost so far. Settles to the package's `total_cost_usd` on `Complete`.
`estimated_total_usd`	decimal \| null	Historical-median forecast; null when the sample is too small. From 2026-05-27, on a run that has auto-resumed after host restarts the forecast is widened by `host_restart_resume_count` (each resume re-runs work), so a resume-prone run's estimate reflects the extra cost instead of reading wildly low against the actual.
`estimated_total_p25_usd` / `estimated_total_p75_usd`	decimal \| null	Percentile bounds; null when the forecast is null. Widened on resumed runs alongside `estimated_total_usd`.
`estimated_total_sample_size`	int \| null	Number of historical generations behind the forecast.
`project_name`	string \| null	Display name (override or auto-extracted).
`description`	string \| null	Short intake-derived description, truncated to 280 chars.
`kind` / `kind_label`	string	`"specification"` + the canonical disambiguation copy.
`failure_category`	string \| null	Typed failure category on `Failed` rows; `null` otherwise. See REST errors page.
`failure_reason`	string \| null	Sanitized human-readable hint on `Failed` rows.
`billing_state`	string \| null	Added 2026-05-18. One of `NotStarted` / `Active` / `PausedRetrying` / `Complete` / `PausedAwaitingInput` (the last added 2026-06-01 — a human-input pause, e.g. answering a clarification: your turn, no cost climbing, nothing stuck; distinct from `PausedRetrying`, a transient-error backoff). Customer-facing billing posture written atomically with every state transition. When `billing_state` is `Active` while `running_cost_usd` climbs, the caller knows their cost isn't being wasted — the platform is actively working. Null on pre-2026-05-18 generations (no projection row yet).
`started_work_at`	ISO-8601 \| null	Added 2026-05-18. When the dispatcher first claimed the generation (distinct from `started_at` which is the queued-at time). Null on pre-2026-05-18 generations and while the row is still in pre-work states.
`phase_detail`	string \| null	Added 2026-05-18. Human-readable phase label derived pure-function from `state` + `current_round` (examples: `"Drafting"`, `"Specialist review (round 2)"`, `"Awaiting your clarification"`). Present on every projection row. Null on pre-2026-05-18 generations.
`progress_explanation`	string \| null	Added 2026-05-18. One-sentence explanation of what's happening at the current `progress_percent` (e.g., `"Specialists are reviewing the draft in parallel"`). Closes the same understanding gap as `billing_state` — the customer sees WHY the progress bar is where it is, not just the number. Null on pre-2026-05-18 generations.
`estimated_duration_seconds`	number \| null	Added 2026-05-18. Historical-median forecast of the run's eventual total wall-clock duration (seconds), keyed by `review_profile`. Null when the historical sample is too small for a confident forecast (the floor is 5 completed generations in the rolling 30-day window) or on pre-2026-05-18 generations.
`estimated_time_remaining_seconds`	number \| null	Added 2026-05-18. Best-effort "expected remaining" computed as `estimated_duration_seconds - elapsed_since_started_work_at`, floored at 0. Null while the generation is queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to `estimating…`).
`estimated_completion_at`	ISO-8601 \| null	Added 2026-05-18. Best-effort wall-clock expected completion: `started_work_at + estimated_duration_seconds`. Null while queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to `estimating…`).
`active_specialist`	string \| null	Added 2026-05-18. During `SpecialistReview` only — slug of the most-recently-completed specialist in the current round (`codd` / `halo` / `tally` / `vera` / `trip` / `merlin` / `polo`). A pragmatic single-value summary of a parallel fan-out. Null outside `SpecialistReview`, when no specialists have completed yet, or on pre-2026-05-18 generations.
`retry_count`	int	Added 2026-05-19. Number of recoverable LLM-provider retries fired during this run (rate-limit / transient 5xx / timeout backoffs). Starts at `0` and only increments mid-run — never decreases. Resets to `0` on a host-restart rewind because the counter belongs to a single dispatch attempt. Tells callers apart "healthy first attempt" (`0`) from "currently riding out a transient hiccup" (`>0`). Always present (defaults to `0` on pre-rollout generations).
`last_retry_at`	ISO-8601 \| null	Added 2026-05-19. UTC timestamp of the most recent retry attempt. Null until the first retry fires.
`next_retry_at`	ISO-8601 \| null	Added 2026-05-19. UTC timestamp the retry policy is currently waiting for before the next attempt (`last_retry_at + backoff_delay`). Null between retries. Lets callers display "next retry in X seconds" without guessing the backoff curve.
`recoverable_error_category`	string \| null	Added 2026-05-19. Typed classifier for the recoverable failure that triggered the most recent retry. One of `rate_limit` / `provider_timeout` / `provider_server_error` / `schema_violation` / `other`. Distinct from terminal `failure_category` — that's set when the run fails for good; this is set when an LLM call temporarily failed but the retry policy is still covering it. Null when no retry has fired yet.
`host_restart_resume_count`	int	Added 2026-05-27. How many times this run was automatically resumed after a host restart (capped at 5). Distinct from `retry_count`: that one is provider-level and resets to `0` on a host-restart rewind, so a run that recovered from several restarts still reads `retry_count: 0`; this counter spans the run's whole life and only climbs. A non-zero value is the honest reason a run's `running_cost_usd` or `estimated_total_usd` runs higher than the clean-run forecast — each resume re-runs work: the full-rewind path re-runs Drafting from scratch, while cheaper in-place resumes pick up from a saved checkpoint. Always present (defaults to `0`).
`refinement_summary`	object \| null	Added 2026-05-29. Outcome of the pre-delivery refinement pass that fills referenced-but-missing docs before a package ships. `null` when the pass didn't run, made no change, and left no gap. When present, an object with: `rounds_used` (int — how many detect → refine → re-validate rounds ran); `generated_count` / `dropped_count` / `residual_count` (int); `generated` and `dropped` (arrays of `{path, referenced_by[]}` — docs filled with real content vs. dangling references removed); `residual` (array of `{path, referenced_by[], reason}` — references that ship as deferred stubs, i.e. the package's known gaps); and `summary` (a ready-to-render string). Mirrors the "Pre-delivery refinements" section in `handoff.md`.
`reconciliation_summary`	object \| null	Added 2026-05-29. Outcome of the pre-delivery contradiction-reconciliation pass that resolves cross-document architecture contradictions (e.g. one doc says PostgreSQL, another DynamoDB) before a package ships. `null` when the pass found nothing to reconcile and left no residual. When present, an object with: `rounds_used` (int — how many detect → reconcile → re-validate rounds ran); `reconciled_count` / `unresolved_count` (int); `reconciled` (array of `{category, summary, affected_locations[]}` — contradictions resolved by redrafting the affected docs to agree); `unresolved` (array of `{category, summary, affected_locations[], reason}` — contradictions that ship as known gaps, with the reason); and `summary` (a ready-to-render string). A reconciled contradiction also disappears from `consistency_findings`. Mirrors the "Pre-delivery reconciliation" section in `handoff.md`.
`blocker_resolution_summary`	object \| null	Added 2026-05-29. Outcome of the pre-delivery blocker resolve-or-clarify pass that acts on residual Critic-flagged blockers before a package ships. `null` when there were no residual blockers to act on. When present, an object with: `resolved_count` / `clarified_count` / `residual_count` (int); `resolved` (array of `{target_section, summary}` — blockers cleared by redrafting); `clarified` (array of `{target_section, summary, question}` — blockers escalated into a clarification question); `residual` (array of `{target_section, summary, reason}` — blockers that ship as known gaps); and `summary` (a ready-to-render string). Mirrors the "Pre-delivery blocker resolution" section in `handoff.md`.
`refinement_audit`	object \| null	Added 2026-05-31. Consolidated audit of the whole pre-delivery refinement pipeline — one flat view of what it auto-fixed versus escalated, aggregated from the three fields above (`refinement_summary` / `reconciliation_summary` / `blocker_resolution_summary`) so you don't have to union three differently-shaped objects. `null` on a clean run where every refinement pass was a no-op. When present, an object with: `auto_fixed_count` / `escalated_count` (int); `auto_fixed` (the pipeline changed the package) and `escalated` (the pipeline surfaced an unresolved gap), each an array of `{pass, action, target, detail}` — `pass` ∈ `stub-fill` / `reconciliation` / `blocker-resolution`; `action` ∈ `generated` / `dropped` / `reconciled` / `resolved` (auto-fixed) or `residual-gap` / `unresolved-contradiction` / `clarified` / `residual-blocker` (escalated); `target` is the doc path / section / contradiction category; `detail` is a human-readable summary / reason / clarification question (may be empty); and `summary` (a ready-to-render string). Mirrors the "Refinement audit" section in `handoff.md`.

`get_events`

Returns recent events from a generation's pipeline — stage transitions, agent handoffs, review outcomes. Useful for giving your agent a richer picture of what happened during a generation, or for debugging a failed run.

Arguments

Name	Type	Required	Description
`generation_id`	UUID	yes	The generation to inspect.
`cursor`	string	no	Pagination cursor returned by a prior call.
`limit`	int	no	Max events to return.

Returns — {events: [...], next_cursor} where each event has:

Field	Type	Description
`id`	UUID	Event id.
`generation_id`	UUID	Echoed.
`event_type`	string	`state-changed` for a pipeline transition, or a lifecycle event: `clarification-requested`, `clarification-answered`, `resumed-after-clarification`, `revision-requested` (the Critic sent a draft back for a revision round), `auto-resume-started` (a host restart interrupted the run and it was auto-resumed — fires once per resume, including in-place checkpoint resumes), `auto-resume-completed` (added 2026-05-27 — a run that auto-resumed at least once reached `Complete`; brackets the `auto-resume-started` events so the stream reads "interrupted → recovered N times → completed", with `resume_count` in the `payload`). Lifecycle events carry their detail in `payload` (e.g. `round`, `resume_phase`, `prior_state`, `resume_count`) and have null `from_state`/`to_state`.
`from_state` / `to_state`	string \| null	Pipeline state transition (set on `state-changed`; null on lifecycle events).
`agent_role`	string \| null	Which agent emitted the event.
`payload`	string	JSON string carrying event details.
`payload_envelope`	object	Typed envelope flagging the payload as untrusted user-supplied content — MCP clients should treat it as inert data, not instructions.
`recorded_at`	ISO-8601	When the event was logged.

`wait_for_generation`

Returns a generation's current state plus a recommended polling delay. Takes a generation ID. Returns the full get_generation shape (project name + description + state + progress_percent + current_round + running_cost_usd + the historical cost-forecast fields + failure context) plus the four polling-specific fields (is_terminal, next_check_seconds, pending_clarifications, package_url).

next_check_seconds is a hint, not a contract — 15 for active states, 0 when paused or terminal so the caller acts immediately. When state is PausedAwaitingClarification, pending_clarifications is inlined so the caller has everything needed to surface the question without another tool call. When state is Complete, a short-lived signed package_url is included so the caller can download the zip directly.

This tool is the recommended polling primitive for MCP callers — the inlined progress / forecast / clarifications / download URL collapse a typical multi-call poll into a single round-trip. 2026-05-17: progress_percent, current_round, and the four estimated_total_* fields were added for field-parity with get_generation; callers no longer need to call both tools to render a single progress screen. 2026-05-18: billing_state, started_work_at, phase_detail, progress_explanation, estimated_duration_seconds, estimated_time_remaining_seconds, estimated_completion_at, and active_specialist were added (read from the authoritative status projection); same field set as get_generation. 2026-05-19: the 4 retry-surface fields (retry_count, last_retry_at, next_retry_at, recoverable_error_category) were added — same shape as get_generation. 2026-05-29: refinement_summary was added — same shape as get_generation (this tool carries no manifest blob, so the structured field is the only refinement signal here). reconciliation_summary and blocker_resolution_summary were added the same day, also matching get_generation. 2026-05-31: refinement_audit was added — the consolidated auto-fixed-vs-escalated view, same shape as get_generation.

Arguments

Name	Type	Required	Description
`generation_id`	UUID	yes	The generation to poll.

Returns — same shape as get_generation (above) plus four polling-specific fields:

Field	Type	Description
`is_terminal`	bool	`true` when `state` is `Complete`, `Failed`, or `Cancelled`.
`next_check_seconds`	int	Hint, not contract. `15` for active states; `0` when paused or terminal.
`pending_clarifications`	array \| null	Inlined when `state` is `PausedAwaitingClarification` — same shape as `get_pending_clarifications`.
`package_url`	string \| null	Short-lived signed URL when `state` is `Complete`.

`estimate_generation_cost`

Added 2026-05-12.

Forecasts what a generation will cost (USD) before calling start_generation. Takes profile — one of Fast, Normal, Extensive, or Researcher. Returns {profile, has_forecast, estimated_total_usd, p25_usd, p75_usd, sample_size, note} — the rolling 30-day median across completed generations for the requested profile, with p25 / p75 confidence bounds and the sample size behind the estimate.

The forecaster is profile-keyed only — it doesn't yet take an intake_id, so the estimate reflects "what this profile usually costs" rather than a per-intake projection. Per-intake variance can be substantial; the p25 / p75 bounds capture that envelope. When the historical sample is below the forecaster's floor, has_forecast is false and the response carries a "not enough data" note rather than a low-confidence number. Useful for sanity-checking cost before kicking off Normal or Extensive runs.

`validate_generation_request`

Added 2026-05-16.

Dry-run companion to start_generation. Takes the same arguments (intake_id required; project_type, has_ui, review_profile, schema_version, rubric_version, quality_rubric_version, mirror_selection optional with the same defaults). Runs the side-effect-free pre-flight checks the live tool does (intake-existence + ownership, account-approval gate, monthly quota + Extra Usage fallback, review-profile-vs-tier, External Connectors tier gate) WITHOUT enqueueing a generation. Returns {is_valid, blocking_errors: [{code, message}], warnings: [{code, message}]}.

Each error's code matches the exception code start_generation would throw on the live path, so callers can branch on stable identifiers:

INTAKE_NOT_FOUND — intake doesn't exist or caller lacks access
USER_PENDING_APPROVAL — account hasn't been approved yet
QUOTA_EXCEEDED — not enough credits for the requested review depth + no Extra Usage rescue
EXTRA_USAGE_INSUFFICIENT — not enough credits + Extra Usage balance below the p75 forecast charge
RESEARCHER_NOT_ALLOWED — the Researcher profile isn't granted on the caller's tier (Fast/Normal/Extensive access is economic — no per-tier lock)
FEATURE_NOT_ALLOWED — intake uses External Connector data but the caller's tier doesn't allow it

Warnings are informational and don't fail validation:

EXTRA_USAGE_WILL_BE_RESERVED — out of credits but Extra Usage covers the next call
CONCURRENCY_AT_CAP / CONCURRENCY_HIGH — concurrency slots heavily in use; a live call right now could race to CONCURRENCY_CAP_REACHED

The concurrency-race caveat: a dry-run that returns is_valid: true can still 409 on a real call if another kickoff lands first. Concurrency state is informational only.

`get_security_findings`

Added 2026-05-16.

Returns the structured security-review findings for a generation. Takes generation_id. Returns {generation_id, has_review, finding_count, max_severity, findings: [{severity, surface, topic, title}, ...]}.

Severity values: Critical, Major, Minor, Info, None. Surface values: Spec, ReferenceCode, GeneratedPackage, PromptInjection. has_review is false when the generation has no manifest yet (still in flight) or the review profile didn't include the Security Expert. Use this to gate automation on a generation's security posture without parsing the markdown report — e.g., max_severity == "Critical" → block. The full report markdown stays in the package zip; this tool exposes only the compact structured projection that already lives in the manifest.

`get_generation_quality_report`

Added 2026-05-16.

Aggregates the four non-security review sections from the generation's manifest into a single structured payload: reliability (Atlas), accessibility (Halo), cost (Tally), risk (Hazard). Takes generation_id. Returns {generation_id, reliability, accessibility, cost, risk} where each sub-section is {has_review, finding_count, max_severity, findings: [{severity, topic, title}, ...]}.

Severity values match get_security_findings. has_review: false on a sub-section means the reviewer wasn't part of the generation's review profile (e.g., the Fast profile skips Cost + Risk). Callers can distinguish "no findings + reviewer ran" from "reviewer didn't run" — useful for PR-gate automation that wants to know whether a quality signal is missing vs known-clean. Pair with get_security_findings for the security gate.

Clarification tools

`get_pending_clarifications`

Returns the structured clarifications a paused generation is waiting on. Takes a generation ID. Returns {state, clarifications} where each clarification has agent, section (may be null), question, why, proposed_default, and — when one question covers a multi-document issue — covered_sections, the list of every affected document path (one answer resolves them all; null on single-target questions). Empty array when the generation isn't paused.

The chat-driven web flow asks these questions through the user's interview chat; this tool exposes the same structured surface to MCP callers so an agent can prompt its user without parsing free-text agent turns.

`answer_clarifications`

Submits answers to a paused generation's clarifications and resumes the run. Takes the generation ID and an array of {question, answer} pairs. Match each question exactly to the verbatim text from get_pending_clarifications — pairing is by question text. Answers must cover every pending clarification (all-or-nothing for v1).

Returns {generation_id, accepted, message}. The orchestrator picks the run back up on the next dispatcher tick and threads the answers into the next agent call.

Generation control

`list_generations`

Lists the caller's generations regardless of whether they have a Package row yet, so callers can see in-progress / failed / paused / cancelled runs alongside completed ones.

Arguments

Name	Type	Required	Description
`status`	string	no	Comma-separated. Roll-up tokens (`in_progress`, `complete`, `failed`, `cancelled`, `paused`) or exact state names (`Drafting`, `Reviewing`, etc.). Case-insensitive.
`limit`	int	no	Default `50`, max `200`.
`offset`	int	no	Default `0`.
`order`	string	no	`desc` (newest-first, default) or `asc`.

Returns — {rows: [...]} where each row has: id, short_id, project_name, state, review_profile, cost_usd, started_at, completed_at, failed_at, current_round, failure_reason, failure_category, source_channel, interview_id, progress_percent (the row's live 0–100 progress; null on pre-rollout generations that have no projection yet).

`delete_generation`

Soft-deletes a generation by id. Takes generation_id. Only allowed on terminal-state rows (Complete, Failed, Cancelled); attempting to delete an in-flight generation throws an error — cancel the run first. Idempotent on already-deleted rows. The generation drops out of list_generations and the workspace; the row stays in the database for audit. Returns {generation_id, action: "deleted"}. Sister tool: restore_generation.

`restore_generation`

Restores a soft-deleted generation by id. Takes generation_id. No state guard — even if the generation was Failed or Cancelled at delete time, restore returns it to your workspace in the same state. Idempotent on already-live rows. Sister to delete_generation. Returns {generation_id, action: "restored"}.

`cancel_generation`

Added 2026-05-08.

Cancels an in-flight generation. Takes generation_id and an optional reason string. Marks the row Cancelled (a distinct terminal state from Failed) and signals the orchestrator's CancellationToken so any in-flight LLM call halts instead of running to completion (avoiding cost on a run you no longer want). Already-terminal rows return an error — cancel_generation is a no-op on Complete, Failed, or already-Cancelled runs. Use this when an agent observes a stuck or runaway generation and wants to bail out cleanly. Returns {generation_id, state: "Cancelled"}.

`retry_generation`

Added 2026-05-08.

Retries a Failed generation. Takes generation_id. Replays the original kickoff command verbatim — same intake artifact, same review profile, same multimodal context (images + reference docs are re-hydrated from blob storage) — as a brand-new generation row. The original Failed row stays in the database for audit. Only the original owner can retry; cross-user retry is rejected. Returns {original_generation_id, new_generation_id, state, package_id}. Common error codes: RETRY_STATE_INVALID (only Failed rows are retryable), RETRY_RESEARCHER_CHILD (re-fire the parent Researcher run from the original interview), RETRY_ENVELOPE_UNAVAILABLE (legacy row predating the persisted-command feature), RETRY_OWNER_MISMATCH. Quota and approval errors (QUOTA_EXCEEDED, USER_PENDING_APPROVAL) propagate from the underlying handler.

`pause_generation`

Added 2026-05-08.

Pauses a running generation. Takes generation_id. User-initiated pause, distinct from the orchestrator's automatic PausedAwaitingClarification state (which fires when an agent needs more input — use get_pending_clarifications + answer_clarifications for that flow). The aggregate records the pre-pause state in the event log so a subsequent resume_generation can restore it. Already-terminal rows return a 409-equivalent error. Returns {generation_id, state: "Paused"}. Sister to resume_generation.

`resume_generation`

Added 2026-05-08.

Resumes a Paused generation back to its pre-pause state. Takes generation_id. Reads the most recent non-Paused to_state from the event log and restores it; the orchestrator picks up where it left off. State must currently be Paused (use get_generation to check); any other state returns a 409-equivalent error. If the event log has no pre-pause state recorded (corrupt history), surfaces the same error. Returns {generation_id, state} where state is the restored pre-pause state. Sister to pause_generation.

`update_generation_name`

Added 2026-05-08.

Set or clear the user-facing display name on a generation. Takes generation_id and an optional name. Useful for correcting placeholder / null project names on completed generations (e.g., when the auto-extractor returned (unnamed) because the intake JSON was missing a project_name). Pass an empty/whitespace name (or omit it) to clear the override and let the auto-extractor's best guess take over. Returns {generation_id, display_name}. Mirrors REST PATCH /v1/generations/{id}/name.

Capabilities & metadata

`get_capabilities`

Added 2026-05-08.

Discover schema versions and the enumerable inputs the API accepts so callers can avoid hardcoding magic strings. Takes no arguments. Returns {schema_version, rubric_version, quality_rubric_version, review_profiles, project_types, mirror_selections}. Anonymous-shaped (the values describe the public contract and don't depend on the caller). Use this BEFORE start_generation / start_interview to discover valid review_profile and project_type values; values change only on deploy. Mirrors REST GET /v1/capabilities.

Account & usage

`get_subscription`

Returns the calling user's subscription tier and current credit quota snapshot. Takes no arguments. Returns {tier, status, current_period_end, quota: {credits_limit, credits_used, concurrency_limit, period_reset_at}} — tier is one of Free, Pro, Team; status reflects Stripe's subscription state. credits_limit / credits_used are denominated in credits (a generation draws credits by review depth — Fast 1 / Normal 2 / Extensive 4); on Free, the allowance is a one-time lifetime credit, so period_reset_at is informational. Useful before kicking off a generation so the agent can warn the user if they're low on credits. Mirrors the subscription field on REST GET /v1/me plus the standalone REST GET /v1/billing/subscription endpoint.

`get_usage`

Aggregates the caller's LLM cost and token usage over a time window. Mirrors REST GET /v1/usage.

Arguments

Name	Type	Required	Description
`from`	ISO-8601 timestamp	no	Window start. Default: 30 days ago.
`to`	ISO-8601 timestamp	no	Window end. Default: now. Max window 366 days.
`group_by`	string	no	One of `provider`, `model`, `role`, `day`, `week`, `month`, `key`, `user`. Defaults to `model`.

Returns

Field	Type	Description
`from` / `to`	ISO-8601	Echoed window bounds.
`group_by`	string	Echoed grouping key.
`rows`	array	Each entry has `{group, input_tokens, output_tokens, cached_tokens, cost_usd, invocation_count}`.

Package tools

`get_latest_package_for_generation`

Added 2026-05-08.

Get the current package metadata + a time-limited download URL for a generation by generation_id (rather than by package_id). Use this when an agent has just completed a generation and wants the package without re-querying list_packages. 404-equivalent error when the generation has no package yet (still in flight) or when the package was soft-deleted. Mirrors REST GET /v1/generations/{id}/package. Future-proofs for the package-update flow (multiple package versions per generation): when that lands, this tool returns the CURRENT (latest) package without callers having to filter list_packages.

Arguments

Name	Type	Required	Description
`generation_id`	UUID	yes	The generation whose latest package to return.

Returns

Field	Type	Description
`id`	UUID	Package id.
`generation_id`	UUID	Echoed.
`version`	string	Package version (currently always `1.0.0`).
`download_url`	string	SAS-tokened blob URL for the package zip.
`download_url_expires_at`	ISO-8601	When the SAS URL expires. Refetch this tool to get a fresh URL.
`total_cost_usd`	decimal	What the generation's LLM calls cost.
`retention_until`	ISO-8601 \| null	When the package will be auto-deleted; `null` means indefinite.
`deleted_at`	ISO-8601 \| null	Set if the package was soft-deleted.
`project_name`	string \| null	Display name (override or auto-extracted).
`description`	string \| null	Short description, truncated to 280 chars.
`kind` / `kind_label`	string	`"specification"` + canonical disambiguation copy.

`list_package_files`

Added 2026-05-08.

Lists every file inside a completed package zip, with the uncompressed size of each entry. Takes package_id. Returns {package_id, files: [{path, size_bytes}, ...]} sorted lexicographically by path. Streams the zip's central directory from blob storage via Azure SDK range requests — the full archive is never materialized on the server. Pair with get_package_file to read individual files without the zip download dance. Useful when a coding agent wants to inspect package structure (architecture docs, requirements, ADRs, etc.) and pick which files to read.

`get_package_file`

Added 2026-05-08.

Returns the bytes of a single file from a package zip. Takes package_id and path (use list_package_files to discover available paths). The response shape depends on the file type:

Text entries (markdown, YAML, JSON, plain text, CSV, SVG): {package_id, path, content_type, content} where content is the raw UTF-8 string.
Binary entries (PNG, unknown extensions): {package_id, path, content_type, content_base64} where content_base64 is the base64-encoded payload (the JSON envelope can't carry malformed UTF-8).

Files larger than 256 KB return an error directing the caller at the bulk zip download URL (use get_package). Path-traversal segments (..) are rejected at the application layer. Streams the requested zip entry from blob storage; the full archive is never materialized.

`search_package`

Added 2026-05-08.

Full-text search across a package's indexed file contents (markdown, YAML, JSON, plain text, CSV, SVG entries — binary files are skipped during indexing). Takes package_id, query, and an optional limit (default 20, max 50). Returns {package_id, query, results: [{file_path, snippet, rank}, ...]} ranked by relevance, newest match first within rank ties. Snippets are HTML-highlighted with <mark>...</mark> markers around match terms; agents can render them directly or strip the tags as preferred.

Query syntax follows Postgres websearch_to_tsquery: quoted phrases ("agent topology"), OR for alternation (auth OR session), -term for exclusion (auth -test). Case-insensitive; English stemming is applied (so searching matches search). An empty query returns an empty result set rather than every row.

Results are scoped to a single package. For cross-package search across every package the caller owns in one round trip, use search_my_packages (below). The index is built at package completion; SpecStep staff can re-trigger indexing on request if it falls out of sync. Mirrors REST GET /v1/packages/{id}/search?q=...&limit=....

Arguments

Name	Type	Required	Description
`package_id`	UUID	yes	The package to search.
`query`	string	yes	`websearch_to_tsquery` syntax (quoted phrases, `OR`, `-term`).
`limit`	int	no	Default `20`, max `50`.

Returns — {package_id, query, results: [{file_path, snippet, rank}, …]}. snippet contains <mark>...</mark> highlights around match terms.

`search_my_packages`

Added 2026-05-08.

Cross-package full-text search across every non-deleted package the caller owns. Takes query and an optional limit (default 10, max 25). Returns {query, results: [{package_id, project_name, version, total_hit_count, files: [{file_path, snippet, rank}, ...]}, ...]} — matched packages ordered by their best per-file rank, with up to 5 file hits embedded in each entry. total_hit_count carries the per-package true count so callers can render "showing N of M" or follow up with search_package for a deep look at any single package.

Same query syntax as search_package (Postgres websearch_to_tsquery — quoted phrases, OR, -term). Empty query returns an empty result set.

Replaces the prior N+1 fan-out pattern (call list_packages, then search_package per package). Use this tool whenever you don't already know which package to search. Mirrors REST GET /v1/packages/search?q=...&limit=....

Arguments

Name	Type	Required	Description
`query`	string	yes	Same `websearch_to_tsquery` syntax as `search_package`.
`limit`	int	no	Default `10`, max `25`.

Returns — {query, results: [{package_id, project_name, version, total_hit_count, files: [{file_path, snippet, rank}, …]}, …]}. Packages are ordered by their best per-file rank; up to 5 file hits embedded per package; total_hit_count is the per-package true count.

`get_package`

Returns the documentation package metadata. Takes a package_id. Read the package_id from start_generation (returned alongside the new generation) or from get_generation once the generation reaches Complete. Includes project_name, description, kind, and kind_label so the deliverable is identifiable + clearly labeled as a specification package, not application code. generation_id is null for packages created by migrating existing documentation rather than by a generation run (Migrate Existing Docs, 2026-05-27); present for generated packages.

`preview_doc_migration`

Classifies an uploaded documentation archive onto the canonical SpecStep package layout and returns the proposed mapping — no persistence. Takes archive_base64 (a base64-encoded .zip; inline cap ~4 MB — use the REST endpoint POST /v1/doc-migrations/preview for larger) and optional source_archive_name. Returns {source_archive_name, source_byte_count, total_file_count, classified_count, unclassified_count, classifier_version, mapping: [{source_path, doc_type, target_path, layer, confidence}, ...], conflicting_target_paths: [...]}. Run this first; a non-empty conflicting_target_paths means two files claim the same canonical slot — resolve with target_path_overrides on commit.

`commit_doc_migration`

Normalizes an uploaded documentation archive into a migrated package and persists it (canonical layout + _source/ for unplaceable files + a source: migrated manifest), linking it to a project. Takes archive_base64 (base64 .zip, ~4 MB inline cap), optional source_archive_name, optional project_id (defaults to your default project), optional version (default 1.0.0), and optional target_path_overrides (a map of source-path → target-path corrections from the reviewed preview). Returns {migration_id, package_id, project_id, version, classified_count, unclassified_count}. The resulting package appears in list_packages / get_package with a null generation_id. Errors when two sources still claim one canonical slot — supply target_path_overrides to resolve.

`list_packages`

Lists documentation packages on your account, with project_name + description + kind annotations on every row so the caller can identify each package without a per-row follow-up. Each row also carries generation_state so callers can tell which packages came from runs that finished cleanly versus runs that failed mid-flight.

Arguments

Name	Type	Required	Description
`limit`	int	no	Default `50`, max `200`.
`offset`	int	no	Default `0`.
`order`	string	no	`desc` (newest-first, default) or `asc`.

Returns — {packages: [...], next_cursor} where each entry has:

Field	Type	Description
`id`	UUID	Package id.
`generation_id`	UUID \| null	Source generation. `null` for packages created by migrating existing documentation — those have no originating run.
`version`	string	Package version.
`total_cost_usd`	decimal	What the generation cost.
`retention_until`	ISO-8601 \| null	When the package will be auto-deleted.
`deleted_at`	ISO-8601 \| null	Set if soft-deleted (filtered out by default).
`project_name`	string \| null	Display name.
`description`	string \| null	Short description, truncated to 280 chars.
`kind` / `kind_label`	string	`"specification"` + canonical disambiguation copy.
`generation_state`	string	Final state of the source generation (`Complete`, `Failed`, etc.).

`request_change`

Added 2026-05-09.

Files a change-management addendum against a completed package. Single-LLM-call flow (~30 seconds, ~$0.40-0.50) that produces a 5-file markdown bundle (background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md) attached as a sibling artifact to the existing package — no version bump.

Use this tool when an agent has a focused single-change request against a completed package — "Add Apple ID OAuth", "Localize French", "Switch session storage from cookies to JWT". For structural rewrites that warrant a fresh package version (~$2.50, multi-agent pipeline), call start_generation off the original interview's intake instead.

The addendum row also writes a bell-dropdown notification under the new AddendumComplete kind so the user sees the change land on their next page load. Mirrors REST POST /v1/packages/{id}/addenda.

Arguments

Name	Type	Required	Description
`package_id`	UUID	yes	The completed package to file the addendum against.
`title`	string	yes	≤ 200 chars. Short label for the change.
`description`	string	yes	≤ 4000 chars. Free-text description of the change requested.

Returns

Field	Type	Description
`addendum_id`	UUID	The new addendum's id.
`package_id`	UUID	The parent package id (echoed).
`download_url`	string	SAS-tokened blob URL for the 5-file markdown zip; valid for one hour.
`cost_usd`	decimal	What the LLM call cost (typically ~$0.40–0.50).

`list_audiences`

Added 2026-05-18.

Public catalog of audiences understood by explain_package. No arguments. Returns {audiences: [{slug, display_name, description}, ...]} — the V1 set is executive, product-manager, engineering-manager, new-engineer, investor, security. Mirrors REST GET /v1/explain/audiences. Use this to populate a picker before calling explain_package, or to validate a slug before submitting.

`explain_package`

Added 2026-05-18.

Rewrites a completed package as a short audience-tailored markdown explanation. One LLM round-trip (~10 seconds, ~$0.05) for a cold call; subsequent calls for the same (package, audience) pair return the cached row instantly and at zero cost.

Use this when an agent needs to summarize a package for a specific reader — e.g., "give me the executive cut" or "explain this to a new engineer" — instead of streaming the full bundle.

Arguments

Name	Type	Required	Description
`package_id`	UUID	yes	The package to explain.
`audience`	string	yes	One of the slugs returned by `list_audiences`.

Returns

Field	Type	Description
`markdown`	string	Audience-tailored explanation, ≤ 8192 chars.
`audience`	string	Echoed slug.
`model`	string	LLM model id used for generation.
`cost_usd`	decimal	Cost of the LLM call (`0` on a cache hit).
`cached`	bool	`true` when the result was served from a previously-generated row.

Errors: EXPLAIN_AUDIENCE_UNKNOWN if the slug isn't in the catalog; QUOTA_EXPLAIN_EXCEEDED if the monthly explanation quota is reached for the caller's tier; "not found" if the package isn't owned by the caller. Mirrors REST POST /v1/packages/{id}/explain.

`list_packages_for_generation`

Added 2026-05-12.

Lists every package produced by a generation. Takes generation_id. Returns {generation_id, packages: [{id, generation_id, version, total_cost_usd, retention_until, deleted_at, addendum_count, addendum_total_cost_usd}, ...]}. Today there is at most one package per generation, but the array shape is forward-compatible with the multi-version-package flow.

Each row carries addendum_count + addendum_total_cost_usd so an agent gets the full package and change-request picture in one call — no chaining get_latest_package_for_generation → list_change_requests → manual cost sum. Owner-scoped — foreign and unknown generation ids surface as "not found." When the generation has no package yet (still in flight or never reached Complete), returns an empty packages array rather than 404 — distinguishes "in flight" from "permission denied."

`list_change_requests`

Added 2026-05-12.

Lists every change-request addendum filed against a package, newest-first. Takes package_id. Returns {package_id, content_warning, addenda: [{id, title, description, cost_usd, created_at, download_url}, ...]}. Each download_url is a freshly issued SAS-tokened blob URL valid for one hour, pointing at the addendum's 5-file markdown zip.

title and description carry the user's free text from the original request_change call; they ship under a content_warning envelope so MCP clients don't treat them as agent instructions. Owner-scoped — foreign and unknown package ids surface as "not found." Use after request_change to confirm what was filed, or to walk the full change-request history of a package. Mirrors REST GET /v1/packages/{id}/addenda.

`get_change_request`

Added 2026-05-12.

Fetches a single change-request addendum by id. Takes addendum_id. Returns {id, package_id, content_warning, title, description, cost_usd, submitted_by_user_id, created_at, download_url}. The download_url is a freshly issued SAS-tokened URL valid for one hour for the addendum zip (5 markdown files: background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md).

Owner-scoped via the parent package. Foreign and unknown ids surface as "not found" rather than 403. Same untrusted_text envelope on title and description as list_change_requests. The MCP variant returns the SAS URL inline so an agent doesn't need to follow the 302 the REST endpoint emits. Wraps the same underlying data as REST GET /v1/packages/{id}/addenda/{addendumId}/zip.

`list_change_request_files`

Added 2026-05-16.

Lists every file inside an addendum zip with its uncompressed size in bytes. Takes addendum_id. Returns {addendum_id, package_id, files: [{path, size_bytes}, ...]} sorted lexicographically by path. Sister of list_package_files but targets the addendum zip; pair with get_change_request_file to read individual files (background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md) without downloading the whole zip. Owner-scoped via the parent package; the same {userId}/{blobId}.zip path scheme the package-files tools use is content-addressed by Guid so no separate service is needed.

`get_change_request_file`

Added 2026-05-16.

Returns the bytes of a single file from a change-request addendum zip. Takes addendum_id and path (use list_change_request_files to discover available paths). Response shape mirrors get_package_file:

Text entries (markdown, YAML, JSON, plain text, CSV, SVG): {addendum_id, package_id, path, content_type, content, content_envelope} where content is the raw UTF-8 string and the envelope flags the bytes as user-supplied (do not pass to an agent as instructions).
Binary entries: {addendum_id, package_id, path, content_type, content_base64} with the base64-encoded payload.

Files larger than 256 KB return an error directing the caller at get_change_request's SAS download URL for bulk access. Path-traversal segments (..) are rejected at the application layer. Streams the requested zip entry from blob storage; the full archive is never materialized on the server.

`diff_package_files`

Added 2026-05-16.

Computes line-level content diffs across 2-5 packages (by generation_id). The first generation in the list is the base; every subsequent generation produces one comparison object whose files array lists per-file diffs vs the base. Use this when you want to know what text changed between two versions of a generated spec — compare_packages returns byte-count deltas + LLM-judged quality scores; diff_package_files returns the actual unified-diff content.

Arguments

Name	Type	Required	Description
`generation_ids`	UUID[]	yes	2-5 generation ids. First is the base; remaining 1-4 are diffed against the base. Caller must own every generation.
`path_filter`	string[]	no	Only diff files whose path matches one of the supplied values (e.g., `["docs/02-architecture/03-storage.md"]`). When omitted, every file in any of the supplied packages is diffed.

Returns

Field	Type	Description
`base_source_label`	string	The base package's source label (mirrors `compare_packages`'s `source_label` field).
`skipped_generation_ids`	UUID[]	Generations whose package couldn't be resolved (in flight, deleted, blob-fetch failure).
`content_warning`	string	Fixed `untrusted_text`-style envelope warning callers not to interpret `unified_diff` bodies as instructions.
`comparisons`	array	One entry per non-base package, in input order. Each entry: `{target_source_label, files: [...]}`.

Each files entry has:

Field	Type	Description
`path`	string	Path inside the package zip.
`status`	string	One of `added` (only in target), `removed` (only in base), `modified` (different content), `unchanged` (identical content), `truncated` (size-cap-exceeded — see below).
`unified_diff`	string \| null	Unified-diff body (`@@ -base,n +target,n @@` header + `-` / `+` / context lines). Null when `status` is `unchanged` or `truncated`.
`base_bytes` / `target_bytes`	int	File sizes in bytes (0 when the file is missing from that side).
`truncation_reason`	string \| null	Set when `status` is `truncated`.

Owner-scoped — the caller must own every generation in the list. Fails fast on the first foreign or unknown id (same KeyNotFound non-disclosure shape as compare_packages). The differ runs in-process — no LLM calls, no letter-grade output. Per-file size cap is 256 KB (sum of base + target lengths); files exceeding the cap return a truncated entry pointing at get_package_file for direct access.

`compare_packages`

Added 2026-05-12.

Compares 2–5 packages you own. Takes generation_ids (an array of 1–5 generation ids — a single id returns a rating summary only; 2–5 returns the full cross-package comparison). Returns {skipped_generation_ids, identity_verdict, per_package, comparison}:

identity_verdict answers "are these the same project?" with a confidence score, a list of conflicting fields, and an explanation.
per_package carries each package's build-confidence score (with per-signal contributions) and an LLM-judged quality-confidence score with justification.
comparison carries the cross-package markdown body plus a structural diff of file lengths per package, gated under a content_warning envelope (the markdown is LLM-authored prose).

Owner-scoped — the caller must own every generation in the list. Fails fast on the first foreign or unknown id so a caller can't burn an LLM-judge call on packages they don't own. The 5-generation cap matches the REST limit and bounds LLM-judge cost. Generations whose package can't be resolved (still in flight, deleted, or a blob-fetch failure) are returned in skipped_generation_ids rather than failing the whole call. Useful when an agent wants to evaluate "which of my packages is best" or "how does my latest run compare to the previous one."

Async is the default (changed 2026-05-19). A real 2–5 package compare runs an LLM-judge pass that typically takes 30–80s — longer than most MCP clients' request timeout. So compare_packages defaults to mode: "async": it enqueues a background job and returns {status: "queued", job_id} within milliseconds. Poll get_compare_packages_status with that job_id for the canonical result. Pass mode: "sync" only when you know the compare fits inside your client's timeout (a single-package rating summary, or two small packages).

Arguments

Name	Type	Required	Description
`generation_ids`	UUID[]	yes	1–5 generation ids. One id returns a rating summary only; 2–5 returns the full cross-package comparison.
`mode`	string	no	`async` (default) — enqueue a job + return `job_id` to poll; `sync` — run inline and return the full result (small compares only, else the MCP client times out).

Returns

In async mode: {status, job_id} — poll get_compare_packages_status(job_id). In sync mode (and as the result payload of a completed async job):

Field	Type	Description
`skipped_generation_ids`	UUID[]	Generations whose package couldn't be resolved (in flight, deleted, blob-fetch failure).
`identity_verdict`	object	`{same_project, confidence, conflicting_fields, explanation}` — answers "are these the same project?".
`per_package`	array	One entry per resolved package with `{generation_id, build_confidence: {score, signals: [...]}, quality_confidence: {score, justification}}`.
`comparison`	object \| null	When ≥ 2 packages resolve: `{content_warning, markdown_body, file_length_diff}`. The markdown is LLM-authored prose under an `untrusted_text` envelope.

`get_compare_packages_status`

Added 2026-05-19 — the poller for compare_packages(mode: "async").

Fetches the status of a background compare job. Takes the job_id returned by an async compare_packages call. Owner-scoped — only the user who enqueued the job can poll it.

Arguments

Name	Type	Required	Description
`job_id`	UUID	yes	The `job_id` from `compare_packages(mode: "async")`.

Returns

Field	Type	Description
`status`	string	`queued`, `running`, `completed`, or `failed`.
`result`	object \| null	Present when `status` is `completed` — the same shape as a `sync` `compare_packages` result (above).
`error_code` / `error_message` / `is_retryable`	string / string / bool	Present when `status` is `failed`. `is_retryable` tells you whether to re-enqueue.

Poll on a gentle cadence (2–5s) until status is completed or failed. A 2–5 package compare usually resolves in 30–80s.

`estimate_change_request_cost`

Added 2026-05-12.

Forecasts what a single request_change addendum will cost (USD). Takes no arguments. Returns {has_forecast, estimated_total_usd, p25_usd, p75_usd, sample_size, note} — the rolling 30-day median across completed addenda with p25 / p75 confidence bounds, or a "not enough data" envelope when the sample is below the forecaster's floor.

No profile dimension — every addendum uses the same prompt and model today, so the forecast is a single global median. The p25 / p75 bounds capture per-addendum variance (driven mostly by description length and change complexity). Symmetric with estimate_generation_cost; useful before calling request_change when cost matters.

`update_package`

The all-in-one mutation tool for packages. Folds three operations into one call (the MCP transport doesn't have a natural HTTP-verb equivalent of DELETE or PATCH, so the operation is encoded as a flag).

Takes package_id plus exactly one of:

retention_until: <date-time | null> — set or clear the package's retention deadline. Pass an ISO-8601 timestamp to extend retention; pass null to make retention indefinite.
delete: true — soft-delete the package. Idempotent. The package row drops out of list_packages but stays in the database for audit + recovery.
restore: true — restore a soft-deleted package. Idempotent on already-live rows. Sister operation to delete: true.

Returns {package_id, action: "deleted" | "restored" | "retention_updated"}. Passing both delete: true and restore: true returns an error.

Why bundled instead of separate delete_package / restore_package tools? Package was the first entity to expose multi-operation mutations through MCP, and bundling them into one tool kept the manifest small. Newer entities (Interview, Generation) use dedicated delete_* / restore_* tools; both styles work.

Support tools

`submit_bug_report`

Submits a bug report tied to the calling user. Takes title, description, optional severity, optional related_generation_id, optional current_route, optional caller-supplied user_agent. Persisted to the caller's report history. Server-side enrichment stamps account name/email/plan, build version, and the AI-tool heuristic.

`list_my_bug_reports`

Lists the caller's bug reports, newest first.

Arguments

Name	Type	Required	Description
`limit`	int	no	Default `20`, max `200`.

Returns — {bug_reports: [...]} where each entry has:

Field	Type	Description
`id`	UUID	Bug report id.
`title`	string	Short title supplied at submit time.
`severity`	string	Caller-supplied severity (or default if omitted).
`status`	string	Triage state (e.g. `Open`, `Triaged`, `Closed`).
`created_at`	ISO-8601	When the report was submitted.
`updated_at`	ISO-8601	Most recent state change.
`related_generation_id`	UUID \| null	Generation the report was filed against (if any).
`triage_notes`	string \| null	Server-side notes added during review.

`get_bug_report`

Returns a bug report by id. Takes bug_report_id. Visible to the submitter; foreign callers get "not found".

Agent-driven feedback acquisition

Added 2026-05-17.

SpecStep tools may include a feedback_recommendation field on their response when the server detects a known quality issue (Otto produced malformed JSON, review loop stalled, etc.). MCP clients (Claude Code, Codex, Cursor, Windsurf) are expected to act on the field by calling submit_feedback with the supplied prefill and the recommendation_token. SpecStep dedups on the token: calling submit_feedback twice with the same token bumps an occurrence counter on the existing row, not a new row.

Response shape when the field is present (today only on get_generation for Failed generations with a quality-signal category — LlmContract, ReviewBudgetExhausted, RedraftNoProgress, ReviewLoopStalled):

{
  // ... tool's normal output
  "feedback_recommendation": {
    "action": "submit_feedback",
    "reason": "<one-line rationale>",
    "dedup_token": "<sha256 hex>",
    "prefill": {
      "type": "InterviewQuality | PackageQuality | EndToEndRun | ...",
      "title": "<concise summary>",
      "full_report": "<server-composed markdown body>",
      "severity": "Info | Low | Medium | High | Critical",
      "generation_id": "<uuid or null>",
      "interview_id": "<uuid or null>"
    }
  }
}

To file, normalize the enum casing (PascalCase → snake_case for type; lowercase for severity) and call submit_feedback with the prefill verbatim plus recommendation_token.

The field is omitted when the user has disabled this behavior in Settings → Notifications → Agent integrations (default on for new users). Absence-of-field means "do nothing" — never prompt the user to file feedback manually based on this signal.

`submit_feedback`

Added 2026-05-16. Distinct from submit_bug_report — feedback evaluates quality (was the interview good, is the package coherent, what's the build confidence). Bug reports are for broken behavior.

Submits structured quality feedback. Required: type (interview_quality, package_quality, end_to_end_run, tooling_experience, api_doc_quality, website_quality, launch_readiness, other), title, full_report (markdown). Optional: target GUIDs (interview_id, intake_artifact_id, generation_id, package_id) — required for run-bound types (interview_quality, package_quality, end_to_end_run). Scalar scores: interview_quality_score, package_quality_score, build_confidence_percent (0-100), letter_grade (A-F). Optional template_id + rubric_version link to a template from list_feedback_templates; pass rubric_section_responses (section-id → free-text) + rubric_scores (section-id → 0-100) to fill the rubric.

Additional optional submitter context (added 2026-05-16): estimated_output_quality (≤50 char qualitative label, distinct from the numeric build_confidence_percent), project_type and review_profile (≤50 chars each — denormalize the run's project type and review profile at submission time), transcript_evidence and package_evidence (arrays of quoted snippets, each ≤2000 chars, supporting the findings).

Each entry in structured_findings accepts three richer fields (each ≤2000 chars): evidence (quoted text from the transcript or package supporting the finding), expected_behavior (what the caller expected to happen), suggested_fix (caller's proposed remediation). Mirrors the specialist-reviewer finding shape so feedback findings + reviewer findings can be aggregated.

Typed evidence (added 2026-05-21): each finding also accepts an optional typed_evidence array (up to 20 items) for machine-readable signal you'd otherwise flatten into prose. Each item is { "kind": <string>, "payload_json": <string ≤4000 chars> }. The kind is one of free, http_response, route, console_error, mcp_tool_call, transcript_turn, screenshot, json_payload, and payload_json must be a well-formed JSON document. Required keys depend on the kind: http_response needs a numeric status; route needs a string url; console_error needs a string message; mcp_tool_call needs a string tool; transcript_turn needs a numeric turnIndex; screenshot needs a string path; free and json_payload accept any well-formed JSON. The prose evidence string and typed_evidence can coexist on the same finding. Read responses echo typed_evidence back in the same shape.

Recurrence threading (added 2026-05-17): pass at most one of recurrence_of_feedback_id or recurrence_of_bug_report_id when filing a row because an earlier feedback or bug report was resolved but the issue came back. Both ids cannot be set on the same submission — the system rejects the call.

Agent-driven dedup (added 2026-05-17): pass recommendation_token when filing in response to a server-emitted feedback_recommendation field (see "Agent-driven feedback acquisition" above). The token (an sha256 hex string) is used to dedup against a 30-day window of open auto-filed rows — a dedup hit bumps an occurrence counter on the existing row instead of creating a new one.

Returns id, type, status, created_at. To avoid spending a submit_feedback call on a validation error, dry-run the shape first with validate_feedback.

`validate_feedback`

Added 2026-05-19. Pre-flight for submit_feedback.

Validates a feedback submission shape without persisting anything. Takes the same input as submit_feedback (the recommendation_token is the only field it drops — dedup is a write-time concern), and the same validation rules apply: template, cap, and section-id violations all fail here exactly as they would at submit time. Returns { valid, errors[] }, where each error is { code, message, param_name } carrying the canonical FEEDBACK_* code (FEEDBACK_TITLE_REQUIRED, FEEDBACK_FULL_REPORT_REQUIRED, FEEDBACK_INVALID, FEEDBACK_TEMPLATE_VERSION_REQUIRED, FEEDBACK_TEMPLATE_UNKNOWN, FEEDBACK_TEMPLATE_TYPE_MISMATCH, FEEDBACK_TEMPLATE_SECTION_UNKNOWN, FEEDBACK_TEMPLATE_SCORE_UNKNOWN — see errors).

Run this first when you're uncertain about template section ids or free-text caps — it catches the error without consuming a submit_feedback call.

`amend_feedback`

(Added 2026-05-21.) Submitter self-correction. While your feedback row is still Open AND within the amend window (10 minutes of submission), fix free-form content in place: feedback_id (required) plus any of title, summary, full_report, transcript_evidence, package_evidence, tags. Omitted fields are left unchanged. NOT amendable: type, severity, target ids, template_id/rubric_version, and structured_findings. Returns the updated id / title / status / updated_at. Errors (surfaced as the tool error message): the row isn't yours, it has already left Open (FEEDBACK_AMEND_NOT_OPEN), or the window has expired (FEEDBACK_AMEND_WINDOW_EXPIRED). Catch a typo right after submit_feedback while the window is still open.

`list_my_feedback`

Lists the caller's feedback rows newest-first. Takes optional limit (1-200, default 20). Returns id / type / title / severity / status / linked GUIDs / template id + version / triage notes plus checked_at and reviewed_at so a submitter can tell whether the row has been looked at or reviewed yet.

`get_feedback`

Returns a feedback row by id. Takes feedback_id. Visible to the submitter; foreign callers get "not found".

The output includes the full record: every field set at submit time (including the 2026-05-16 additions — estimated_output_quality, project_type, review_profile, transcript_evidence, package_evidence, plus the richer per-finding evidence / expected_behavior / suggested_fix) and the server-managed lifecycle stamps (checked_at, reviewed_at).

`list_feedback_templates`

Lists the available code-defined feedback templates (rubrics) so a client can pick one before submitting. Returns id / version / title / description / section_count.

Seven templates ship in v1, each pairing with a FeedbackType:

Template id	Pairs with type	Scope
`end-to-end-specstep-quality` v1.0.0	`end_to_end_run`	One full SpecStep run (interview through generated package) — 13 sections covering interview quality, package coherence, build confidence, letter grade, top blockers, recommended fixes.
`interview-quality` v1.0.0	`interview_quality`	Otto's performance during a single Interview — 7 sections covering pacing, follow-up quality, coverage breadth, rapport, gaps, recommended follow-ups.
`package-buildability` v1.0.0	`package_quality`	Whether a generated package is buildable as-is by an AI coder — 8 sections covering coherence, completeness, AI-coder clarity, edge-case coverage, data-shape ambiguities, effort-estimate accuracy, top risks.
`api-doc-quality` v1.0.0	`api_doc_quality`	The public `/api-docs/*` surface — 8 sections covering endpoint coverage, completeness, example clarity, error-handling docs, schema clarity, missing sections, recommended improvements.
`tooling-experience` v1.0.0	`tooling_experience`	The SpecStep tooling surfaces — 9 sections covering MCP ergonomics, CLI / IDE integration, error-message clarity, performance, friction points, recommended improvements.
`website-quality` v1.0.0	`website_quality`	The public marketing/docs site at specstep.com — 11 sections covering visual polish, copy quality, SEO + sitemap correctness, route correctness, mobile experience, console cleanliness, content sanitization.
`launch-readiness` v1.0.0	`launch_readiness`	Cross-cutting pre-launch review — 12 sections covering Priority-0 blockers, public content sanitization, trust posture, API + MCP stability, mobile readiness, accessibility, performance, observability, and a final go / no-go recommendation.

`get_feedback_template`

Returns one template's full content (all sections + prompts + optional score scales). Takes template_id + version.

Webhook subscription tools

Added 2026-05-12.

The five tools below mirror the REST webhook-management surface (/v1/api-keys/{apiKeyId}/webhooks). They let a cookie-authenticated agent register, rotate, smoke-test, and revoke webhook subscriptions on its own API keys. The mutating tools (create_webhook, rotate_webhook_secret, test_webhook) refuse API-key principals by design — a compromised key must not be able to redirect, silently re-sign, or spam-fire event payloads. list_my_webhooks and delete_webhook are safe from any context (read-only and revocation, respectively). Programmatic callers that have explicitly accepted the redirect risk can use the REST endpoints directly — see REST Step 7.5 for the bearer-callable surface.

`list_my_webhooks`

Added 2026-05-12.

Lists every webhook subscription attached to a caller-owned API key. Takes api_key_id. Returns {api_key_id, webhooks: [{id, url, events, created_at, updated_at, last_delivery_at, last_delivery_status, last_delivery_http_status, needs_rotation}, ...]}. The signing secret is never returned by list — the plaintext is shown only once, at create or rotate time. needs_rotation flags subscriptions whose secret was issued under a deprecated scheme and should be rotated. Foreign and unknown API-key ids surface as "not found." Mirrors REST GET /v1/api-keys/{apiKeyId}/webhooks.

`create_webhook`

Added 2026-05-12.

Registers a new webhook subscription against a caller-owned API key.

The signing_secret is returned once in this response — store it before the response is discarded; list_my_webhooks will not return it. If lost, rotate via rotate_webhook_secret. The URL must point to an externally routable host: loopback, link-local, and internal addresses are rejected to prevent SpecStep from being used as a proxy to probe networks on the receiver's side. Unknown event types are rejected with the offending names listed.

Refuses API-key principals — a compromised key must not be able to redirect future event payloads to an attacker-controlled URL. Cookie-authenticated humans register webhooks for their own keys via this tool; programmatic callers can use the REST endpoint with explicit risk acceptance. Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks.

Arguments

Name	Type	Required	Description
`api_key_id`	UUID	yes	The caller-owned API key to attach the subscription to.
`url`	string	yes	Absolute `https://` URL. Loopback / link-local / internal addresses are rejected.
`events`	string[]	yes	At least one event type — e.g. `generation.completed`, `generation.failed`.

Returns

Field	Type	Description
`id`	UUID	The new subscription's id.
`api_key_id`	UUID	Echoed.
`url` / `events`	—	Echoed.
`created_at`	ISO-8601	Creation timestamp.
`signing_secret`	string	Returned once. Use to validate HMAC-SHA256 signatures on delivered payloads.
`signing_secret_note`	string	Reminder: this is the only time the plaintext is returned.

`rotate_webhook_secret`

Added 2026-05-12.

Issues a fresh signing secret for an existing webhook subscription. Takes api_key_id and webhook_id. Returns {id, api_key_id, updated_at, signing_secret, signing_secret_note}. The new plaintext is returned once — update every consumer that validates payloads against this subscription's signature before discarding the response.

The old secret is invalidated immediately on the dispatcher side. In-flight deliveries already signed with the old secret may still arrive at the receiver for a brief window — if you can, bracket rotations with a tolerance window on the receiver (accept either signature for a short period after rotation).

Refuses API-key principals — a compromised key rotating the signing secret could silently lock the legitimate owner out of validating subsequent payloads. Cookie-authenticated humans rotate via this tool; programmatic callers go through REST with explicit risk acceptance. Foreign and unknown ids surface as "not found." Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/rotate-secret.

`test_webhook`

Added 2026-05-12.

Fires a synthetic webhook.test event against a registered subscription and returns the live delivery outcome. Takes api_key_id and webhook_id. Returns {success, http_status, failure_reason, latency_ms, delivery_id} — lets the owner verify reachability and signature validation without waiting for a real generation event. Useful right after create_webhook or rotate_webhook_secret to confirm the receiver is healthy.

Refuses API-key principals — the dispatcher already enforces externally-routable and DNS-rebinding guards, but a compromised key shouldn't be able to spam owner-initiated POSTs at attacker-controlled URLs. Cookie-authenticated humans test from the management UI or via this tool; programmatic callers go through REST with explicit risk acceptance. Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/test.

`delete_webhook`

Added 2026-05-12.

Removes a webhook subscription from a caller-owned API key. Takes api_key_id and webhook_id. Returns {api_key_id, webhook_id, deleted: true}. Idempotent — unknown, foreign, and already-removed webhooks surface as "not found" (the subscription is gone either way).

Allowed for both cookie and API-key callers — revocation is always safe. The worst case is an API key disabling its own webhook, which is the legitimate use case for self-managed scriptable infrastructure. Contrast create_webhook, rotate_webhook_secret, and test_webhook, which refuse API-key callers because those operations could redirect or silence event delivery. Mirrors REST DELETE /v1/api-keys/{apiKeyId}/webhooks/{webhookId}.

Webhooks instead of polling

For long-running automations or external systems where polling is awkward, register a webhook subscription on your API key and let SpecStep POST state changes to you. Subscriptions are managed through the REST API — see the step 7.5 walkthrough. The same JSON projection that comes back from get_generation / wait_for_generation is delivered in the webhook body, with HMAC-SHA256 signatures (X-SpecStep-Webhook-Signature) and a delivery id (X-SpecStep-Webhook-Delivery) for dedup. v1 is best-effort with bounded retry; the canonical state remains wait_for_generation.

What MCP is

Authentication

Browser-based sign-in (OAuth 2.1, recommended)

Dynamic Client Registration (RFC 7591)

API key (for CI / automation)

Transport

Manual JSON-RPC walkthrough

1. initialize

2. notifications/initialized

3. tools/list

4. tools/call

Connecting an MCP client

Recommended next steps after connecting

End-to-end flow via MCP

Session-state kit — disciplined build sessions

What the skills do

Cost to build

Hooking it up

Backfilling past sessions

Session-state & project tools

Projects

Build sessions

Decision log

Backlog

Imports

Cross-aggregate

Lessons & rules

Migrating an existing project

Recommended MCP workflows

1. Create a new package from scratch

2. Inspect a completed package

3. Compare two or more generations

4. Apply a small change to an existing package

5. Gate automation on quality and security

6. Attach external reference docs

7. Use webhooks instead of polling

8. Inspect or resume an in-flight generation

9. Retry or cancel a failed generation

10. Soft-delete and restore

11. File a bug report or quality feedback

12. Capability and subscription discovery

Tool selection guide

Available tools

Interview tools

start_interview

submit_interview_turn

get_interview_turn_status

cancel_interview_turn

list_interviews

get_interview

delete_interview

restore_interview

list_intake_artifacts

get_intake_artifact

External-connector tools

attach_external_folder

get_attach_external_folder_session

Generation tools

start_generation

get_generation

get_events

wait_for_generation

estimate_generation_cost

validate_generation_request

get_security_findings

get_generation_quality_report

Clarification tools

get_pending_clarifications

answer_clarifications

Generation control

list_generations

delete_generation

restore_generation

cancel_generation

retry_generation

pause_generation

resume_generation

update_generation_name

Capabilities & metadata

get_capabilities

1. `initialize`

2. `notifications/initialized`

3. `tools/list`

4. `tools/call`

`start_interview`

`submit_interview_turn`

`get_interview_turn_status`

`cancel_interview_turn`

`list_interviews`

`get_interview`

`delete_interview`

`restore_interview`

`list_intake_artifacts`

`get_intake_artifact`

`attach_external_folder`

`get_attach_external_folder_session`

`start_generation`

`get_generation`

`get_events`

`wait_for_generation`

`estimate_generation_cost`

`validate_generation_request`

`get_security_findings`

`get_generation_quality_report`

`get_pending_clarifications`

`answer_clarifications`

`list_generations`

`delete_generation`

`restore_generation`

`cancel_generation`

`retry_generation`

`pause_generation`

`resume_generation`

`update_generation_name`

`get_capabilities`

`get_subscription`

`get_usage`

`get_latest_package_for_generation`

`list_package_files`

`get_package_file`

`search_package`

`search_my_packages`

`get_package`

`preview_doc_migration`

`commit_doc_migration`

`list_packages`

`request_change`

`list_audiences`

`explain_package`

`list_packages_for_generation`

`list_change_requests`

`get_change_request`

`list_change_request_files`

`get_change_request_file`

`diff_package_files`

`compare_packages`

`get_compare_packages_status`

`estimate_change_request_cost`

`update_package`

`submit_bug_report`

`list_my_bug_reports`

`get_bug_report`

`submit_feedback`

`validate_feedback`

`amend_feedback`

`list_my_feedback`

`get_feedback`

`list_feedback_templates`

`get_feedback_template`

`list_my_webhooks`

`create_webhook`

`rotate_webhook_secret`

`test_webhook`

`delete_webhook`