If your client speaks MCP natively (Claude Code, Claude Desktop, IDE extensions), skip to Connecting an MCP client — the client handles the protocol for you. The Manual JSON-RPC walkthrough below is for anyone implementing an MCP client by hand or adapting a custom agent runtime.
SpecStep's MCP (Model Context Protocol) server exposes the same generation engine as the REST API, but shaped as discrete tools your AI coding agent can call directly. If your agent is already MCP-capable — Claude Code, Claude Desktop, or a compatible IDE — you can point it at the SpecStep MCP server and let it request documentation without hand-crafting HTTP.
What MCP is
MCP is a protocol for connecting AI agents to external tools and data sources. It uses JSON-RPC messages over HTTP. The agent calls initialize to discover what tools are available, then invokes tools by name with structured arguments. The server returns structured results the agent can read and reason over.
SpecStep implements MCP over a single HTTP endpoint. There is no WebSocket or streaming transport — each tool call is a POST with a JSON-RPC envelope, and the response is returned in the same HTTP response.
Authentication
SpecStep supports two ways to authenticate MCP calls. Browser-based sign-in is the recommended default — your MCP client opens a browser, you sign in once, and the client receives a token without any key management on your part.
Browser-based sign-in (OAuth 2.1, recommended)
The MCP server advertises OAuth 2.1 with PKCE per the MCP spec. Compatible clients — Claude Desktop, Claude.ai, Cursor, Codex, GitHub Copilot, Continue, Cline, and any client that implements the MCP authorization extension — trigger the flow automatically:
- The first unauthenticated call to
/mcpreturns401 Unauthorizedwith aWWW-Authenticateheader pointing at the protected-resource metadata document. - The client fetches the discovery document at
/.well-known/oauth-protected-resource(and/.well-known/oauth-authorization-server) to learn the authorize and token endpoints. - The client opens
https://specstep.com/oauth/authorize?…in your browser. - You sign in to SpecStep (via the existing Entra account) and click Allow on the consent screen.
- The browser 302s to a loopback URL the MCP client is listening on, carrying a one-time authorization code.
- The client exchanges the code at
/oauth/token(PKCE-verified) and receives aBearer oat_…access token valid for 90 days.
You can review and revoke browser-based sign-ins from Settings → API keys → Connected MCP clients.
Dynamic Client Registration (RFC 7591)
Added 2026-05-15.
The discovery document at /.well-known/oauth-authorization-server advertises a registration_endpoint of https://specstep.com/oauth/register. Any MCP client that speaks RFC 7591 — Codex, Claude Desktop, Cursor, Continue, Cline, and any other client following the MCP authorization extension — registers itself on first connect without any pre-shared client_id:
- The client POSTs its metadata to
/oauth/register:{ "client_name": "Codex", "redirect_uris": ["http://127.0.0.1:54321/callback"] } - The server validates each
redirect_uriagainst the RFC 8252 loopback allowlist (http://127.0.0.1:<port>/…orhttp://localhost:<port>/…), mints a freshclient_idof the shapemcp_<32-hex>, and returns the RFC 7591 §3.2.1 envelope:{ "client_id": "mcp_e0f4261b3ad3b5e8dd3ae4c5327a6fec", "client_name": "Codex", "redirect_uris": ["http://127.0.0.1:54321/callback"], "grant_types": ["authorization_code"], "response_types": ["code"], "token_endpoint_auth_method": "none", "client_id_issued_at": 1715800000 } - The client uses that
client_idfor the subsequent/oauth/authorize+/oauth/tokenhandshake described above.
Registration is anonymous (no API key, no cookie) and rate-limited to 30 registrations per IP per hour. The legacy hardcoded client_id specstep-mcp-generic is still accepted for pre-RFC-7591 clients; new integrations should register their own.
Only the loopback redirect-URI shape is allowed. Public HTTPS redirects, non-HTTP schemes, host-substring tricks, and userinfo-form URIs are rejected with error: "invalid_redirect_uri". Only grant_type=authorization_code, response_type=code, and token_endpoint_auth_method=none (public clients with PKCE) are accepted in the registration request; anything else returns error: "invalid_client_metadata".
API key (for CI / automation)
For headless or server-to-server flows where no browser is available, the existing API-key scheme works:
POST https://specstep.com/mcp
Content-Type: application/json
Authorization: Bearer sf_xxxxxxxxxxxx
Create one at Settings → API keys. The same rate limits apply to both auth schemes — API-key callers have an independent per-key counter; OAuth callers share a single per-user counter across all connected clients. See rate limits for the full scoping rules.
A key's scopes govern which tools it can reach. Most tools below work with any authenticated key, but the session-state and project tools — build sessions, the decision log, the backlog, and project management — are opt-in: a key sees them in tools/list only when it carries the matching scopes (session_state.read, session_state.write, projects.read, projects.write), and a project-scoped key is confined to its one project. See Session state and project tools for the scope reference and how to mint a project-scoped key.
Transport
All MCP traffic goes to:
POST https://specstep.com/mcp
Content-Type: application/json
Authorization: Bearer <oat_… or sf_…>
The body is a JSON-RPC 2.0 object. The server returns JSON-RPC results or errors.
A minimal tool call looks like:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "get_generation",
"arguments": { "generation_id": "gen_01hx..." }
}
}
Most MCP clients handle the JSON-RPC envelope for you. You configure the server URL; the client either negotiates OAuth automatically or, if you've supplied an API key, attaches the bearer.
Manual JSON-RPC walkthrough
This section shows the exact wire shape for clients written by hand — no MCP library. Every example below is a single POST https://specstep.com/mcp with Authorization: Bearer sf_… (or oat_… from the OAuth flow) and Content-Type: application/json. The server returns the JSON-RPC response in the same HTTP response.
1. initialize
The handshake. The client announces its protocol version + capabilities; the server replies with its identity and what it supports.
Request:
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "my-agent", "version": "0.1.0" }
}
}
Response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2025-03-26",
"capabilities": {
"tools": { "listChanged": false }
},
"serverInfo": {
"name": "specstep",
"version": "0.1.0"
}
}
}
protocolVersion is the MCP spec version SpecStep speaks; pin your client to it or treat anything matching 2025-* as compatible. capabilities.tools.listChanged: false means the server does not push tool-list updates — refetch tools/list explicitly if you suspect the manifest changed.
2. notifications/initialized
Per the MCP spec, the client follows up with a one-way notification (no id field, no expected response). SpecStep treats initialize as the only required handshake and tolerates clients that skip the notification, but well-behaved clients send it:
{ "jsonrpc": "2.0", "method": "notifications/initialized" }
3. tools/list
Discover the tool catalog.
Request:
{ "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {} }
Response (truncated — see Available tools below for the complete list):
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"tools": [
{
"name": "start_interview",
"description": "Starts a new interview. Returns the interview id and initial agent turn.",
"inputSchema": {
"type": "object",
"properties": {},
"additionalProperties": false
}
},
{
"name": "submit_interview_turn",
"description": "Submits a user turn to an interview. Returns the agent's reply and updated state.",
"inputSchema": {
"type": "object",
"properties": {
"interview_id": { "type": "string", "format": "uuid" },
"message": { "type": "string", "minLength": 1 }
},
"required": ["interview_id", "message"],
"additionalProperties": false
}
}
]
}
}
Each entry has name, description, and a JSON Schema inputSchema. The schema is what your agent should hand to its LLM as the tool signature — names and types are authoritative.
4. tools/call
Invoke a tool. Tool results are wrapped in MCP content blocks; v1 always emits a single text block carrying the tool's JSON payload as a string. Parse it on the client.
Request:
{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "start_interview",
"arguments": {}
}
}
Response:
{
"jsonrpc": "2.0",
"id": 3,
"result": {
"content": [
{
"type": "text",
"text": "{\"id\":\"01952fcb-cd11-7c3e-9a2e-3b1d8f5e6a04\",\"status\":\"active\",\"transcript\":[{\"role\":\"agent\",\"content\":\"Tell me what you're building...\"}]}"
}
]
}
}
The result.content[0].text field is a JSON string — parse it again on your side to get the structured payload (interview id, status, transcript, etc.). MCP errors come back as standard JSON-RPC error envelopes (result absent, error: {code, message}); typed application errors (quota exceeded, ownership conflicts, paused-state guards) prefix the message with a stable error code like QUOTA_EXCEEDED: ... or RETRY_STATE_INVALID: ... so clients can branch on it.
Connecting an MCP client
The exact configuration depends on your client. There are two shapes — pick the one that matches whether your client supports OAuth.
OAuth-capable clients (recommended) — Claude Desktop, Claude.ai, Cursor, Codex, GitHub Copilot, Continue, Cline, and any client that implements the MCP authorization extension. Configure the server URL only; the client handles the browser sign-in flow on first connect:
{
"mcpServers": {
"specstep": {
"url": "https://specstep.com/mcp"
}
}
}
On the first tool call, the client opens a browser to SpecStep, you sign in and click Allow, and the client receives a 90-day access token. You can revoke it from Settings → API keys → Connected MCP clients.
API-key clients — for clients without OAuth support, or headless / CI flows where no browser is available:
{
"mcpServers": {
"specstep": {
"url": "https://specstep.com/mcp",
"headers": {
"Authorization": "Bearer ${SPECSTEP_API_KEY}"
}
}
}
}
Replace ${SPECSTEP_API_KEY} with a key minted from Settings → API keys.
After connecting, call initialize (or the equivalent in your client) to retrieve the tool manifest. The server returns the list of available tools with their argument schemas.
End-to-end flow via MCP
The same steps as the REST walkthrough, expressed as tool calls. An MCP-capable agent can drive this entire sequence autonomously.
1. Start an interview. Call start_interview with a description of what you're building. Store the returned interview_id.
2. Submit turns. Call submit_interview_turn with the interview_id and your first message. Read the agent's response. Continue submitting turns — answering the AI Team's questions about vision, users, requirements, constraints, and architecture — until the returned interview state is completed. A typical interview takes five to fifteen turns.
3. Start a generation. Call start_generation with the intake_id (the intake artifact identifier produced by completing the interview) and your chosen review profile. Store the returned generation_id. The generation is now Queued.
4. Poll for completion. Call wait_for_generation with the generation_id and respect the returned next_check_seconds hint between calls. The state will move from Queued to Drafting / Reviewing / FreshEyes as the generation runs, then to a terminal Complete / Failed / Cancelled. wait_for_generation is preferred over get_generation because it inlines the polling cadence + the pending-clarifications + the download URL, cutting most flows to a single tool call.
4a. Handle a paused clarification. If wait_for_generation returns state: "PausedAwaitingClarification", the response already includes pending_clarifications (get_pending_clarifications would return the same payload, so no extra round-trip needed). Surface the question text to your user, gather their answers, then call answer_clarifications with {question, answer} pairs that match the question text verbatim. The generation resumes on the next dispatcher tick. Skip this step entirely when no clarification fires.
5. Retrieve the package. When wait_for_generation reports state: "Complete", the response carries a short-lived package_url you can download from directly. If you want richer metadata, call get_package with the package_id; for history, list_packages.
6. Deliver (optional). Package delivery — committing to a GitHub repository and opening a pull request — is handled via the REST API (POST /v1/packages/{id}/deliver). MCP tools do not cover delivery in this version.
Recommended MCP workflows
Twelve short recipes covering the common reasons an agent calls SpecStep. Each names the tools in order — argument schemas live in the reference catalog below.
1. Create a new package from scratch
start_interview— opens the interview, returnsinterview_id.submit_interview_turn— submit user messages until the interview reportscompleted.validate_generation_request— recommended pre-flight; returns{is_valid, blocking_errors, warnings}without enqueueing.start_generation— kick off the run. Returnsgeneration_id.wait_for_generation— block on terminal state with built-in polling cadence.get_latest_package_for_generation— resolve the produced package.list_package_files/get_package_file— read individual files on demand.
2. Inspect a completed package
list_packages(account-wide) orlist_packages_for_generation(one generation).get_package— the package record + a fresh SAS download URL.list_package_files— the zip's central-directory listing.get_package_file— read individual files without downloading the zip.search_package— full-text search inside a single package.
3. Compare two or more generations
compare_packages— high-level identity verdict + per-package build / quality confidence scores.diff_package_files— line-level unified diff across same-named files.get_generation_quality_report— structured reliability / accessibility / cost / risk findings per generation.get_security_findings— structured security-expert findings per generation.
4. Apply a small change to an existing package
estimate_change_request_cost— check the rolling-30-day median cost before paying for the call.request_change— file the addendum (one LLM call; cheaper than a full re-gen).list_change_requests— the addendum history for a package.get_change_request— one addendum record + SAS download URL for the zip.list_change_request_files+get_change_request_file— read the addendum's five markdown files without unzipping.
5. Gate automation on quality and security
wait_for_generationuntilstate == "Complete".get_security_findings— branch onmax_severity(Critical/Major/Minor/Info/None).get_generation_quality_report— reliability / accessibility / cost / risk severities for the same generation.- Fail or warn based on the severity thresholds your gate enforces.
6. Attach external reference docs
attach_external_folder— returns a one-time browser URL the user opens to complete OAuth + folder pick.- User opens the URL in their browser; SpecStep handles provider OAuth and first sync server-side.
get_attach_external_folder_session— poll untilstatus == "Completed"(or a terminal failure).- Continue the interview or generation flow; the folder's files are now available as reference documents.
7. Use webhooks instead of polling
create_webhook— subscribe a target URL to one or more event types. The signing secret is returned once.test_webhook— fire a syntheticwebhook.testevent to verify the target is reachable.rotate_webhook_secret— issue a fresh signing secret and invalidate the old one.delete_webhook— retire the subscription.
wait_for_generation remains the canonical polling fallback when the webhook target is unavailable.
8. Inspect or resume an in-flight generation
list_generationsfiltered bystate— find your in-flight runs.get_generation— the full aggregate includingprogress_percentand cost-forecast fields.get_events— chronological telemetry (state transitions, agent activity).- If
state == "PausedAwaitingClarification":get_pending_clarificationsthenanswer_clarifications— the generation resumes on the next dispatcher tick. wait_for_generation— block on the terminal state.
9. Retry or cancel a failed generation
get_generation— readfailure_categoryto decide whether retry is appropriate.retry_generationto re-fire from the original kickoff envelope — see errors §409 for the four typed retry-rejection codes (RETRY_STATE_INVALID,RETRY_RESEARCHER_CHILD,RETRY_ENVELOPE_UNAVAILABLE,RETRY_OWNER_MISMATCH).- OR
cancel_generationif abandoning the run. wait_for_generationafter a successful retry.
10. Soft-delete and restore
Asymmetric by historical convention — Package's delete/restore are folded into update_package's flags; Generation has dedicated tools.
- Packages.
update_packagewithdelete: true— soft-deletes the row. Later:update_packagewithrestore: true. - Generations.
delete_generation— soft-deletes. Later:restore_generation.
Soft-deleted rows drop out of the default list queries; they're still recoverable until the 30-day retention window auto-purges them.
11. File a bug report or quality feedback
Pick the type that fits.
Broken behavior (404, wrong output, crash):
submit_bug_report— include diagnostic context (URL, generation id, error excerpt). Returns thebug_report_id.list_my_bug_reports— your filed reports and their current state.get_bug_report— one record, including any review notes and state transitions.
Quality evaluation (was the interview good, is the package coherent, what's the build confidence):
list_feedback_templates— discover the available rubrics. Seven templates ship in v1:end-to-end-specstep-quality(whole-run),interview-quality(Otto behavior only),package-buildability(deliverable only),api-doc-quality(the/api-docs/*surface),tooling-experience(MCP / CLI / IDE ergonomics),website-quality(the public marketing/docs site), andlaunch-readiness(cross-cutting pre-launch review). Pick the one whose scope matches the feedback — narrower rubrics keep the signal cleaner than the all-in-one.get_feedback_template— fetch the full sections for the chosen template to see which section ids to fill.validate_feedback— (Added 2026-05-19.) Dry-run the submission shape before committing. Returns{ valid, errors[] }where each error carriescode(the canonicalFEEDBACK_*error code),message, andparam_name. Same input assubmit_feedbackminus therecommendation_token. Use this when you're guessing at the rubric's section ids or the cap on a free-text field — better to catch the mistake without burning asubmit_feedbackcall.submit_feedback— includetype,title,full_report, the linked GUIDs (interview_id/generation_id/package_id), andrubric_section_responses+rubric_scoresif you used a template.list_my_feedback— your filed feedback and its current state.get_feedback— one record, including any review notes and state transitions.
12. Capability and subscription discovery
get_capabilities— schema versions, accepted enum values (review_profile,project_type,mirror_selection). Call BEFOREstart_generationso you can avoid hardcoding magic strings that change on deploy.get_subscription— the caller's tier (Free/Pro/Team) + quota snapshot. Branch on tier-allowed profiles before kicking off generations.
Tool selection guide
A quick mapping from common agent intent to the best first tool. When in doubt, start here, then read that tool's reference entry below for argument detail.
| Intent | Start with |
|---|---|
| "I need to know what values are valid" | get_capabilities |
| "I want to know if a generation will succeed" | validate_generation_request |
| "I need to know if my tier allows this review profile" | get_subscription |
| "I need the latest package for a generation" | get_latest_package_for_generation |
| "I need one file from a package" | list_package_files → get_package_file |
| "I need to search across all my packages" | search_my_packages |
| "I need to inspect an addendum" | list_change_requests → get_change_request → get_change_request_file |
| "I need to compare packages" | compare_packages + diff_package_files |
| "I need review findings as data, not prose" | get_security_findings + get_generation_quality_report |
| "My generation is paused — what's the question?" | get_pending_clarifications → answer_clarifications |
| "My generation failed — why?" | get_generation (read failure_category) → get_events |
| "I want to retry a Failed generation" | retry_generation |
| "I need account-wide cost over a period" | get_usage |
| "I want to rate or evaluate a finished run" | list_feedback_templates → get_feedback_template → submit_feedback |
| "I want to know if a feedback submission will be accepted" | validate_feedback (dry-run) → submit_feedback |
| "I want to file a bug, not rate a run" | submit_bug_report (broken behavior; use submit_feedback for quality evaluation) |
Available tools
These are the SpecStep MCP tools available to standard authenticated callers. The nine categories below group tools by capability area.
Interview tools
start_interview
Creates a new interview. Takes no arguments — the opening agent turn arrives in the response's transcript. Describe what you're building in your first submit_interview_turn call (project type, vision, constraints); the interview's detected_type is inferred from that first turn.
submit_interview_turn
Submits a turn to an existing interview. Default mode is async (changed 2026-05-19): the call commits your user turn + enqueues a background job and returns a job_id you poll via get_interview_turn_status (or subscribe to the InterviewTurnJobStatusChanged SignalR push). Legacy inline-reply behavior is available via mode: "sync" but is subject to the ~60s Front Door ceiling and is scheduled for removal after one release cycle.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
interview_id |
UUID | yes | The interview to append the turn to. |
message |
string | yes | The user's turn. Empty / whitespace strings are rejected; cap is 16,384 characters. |
client_request_id |
string | no | Optional idempotency token (1..128 chars of [A-Za-z0-9._:-]). A retry with the same value returns the cached result of the first call instead of re-invoking the LLM. Recommended for any caller that might retry on network failure. |
mode |
string | no | Default "async" (changed 2026-05-19): returns a job_id you poll via get_interview_turn_status. Pass "sync" to opt into the legacy inline-reply path (subject to the ~60s Front Door ceiling — may 504 on long interviews; scheduled for removal). |
Returns (async mode, default) — either:
{status: "queued", job_id, interview_id, submission_id?, user_turn_committed: true, snapshot: null}— your user turn committed; pollget_interview_turn_status(job_id)for the agent reply.{status: "cached_replay", job_id: null, interview_id, submission_id, user_turn_committed: true, snapshot: <interview snapshot>}— you supplied aclient_request_idwhose original call already completed; here's the cached result.
Returns (sync mode, opt-in) — full interview snapshot: {id, status, detected_type, started_at, last_activity_at, completed_at, intake_artifact_id, transcript: [{role, content, occurred_at}, …], started_generation_id?, auto_start_failure?}. Read the last agent-role entry of transcript for the agent's reply. When the interview just transitioned to status: "complete", the response also carries the auto-handoff fields below.
Completion auto-handoff (added 2026-05-17). When the agent signals completion on a turn (the interview transitions to complete and an intake_artifact_id is produced), SpecStep auto-starts a generation with sensible defaults (review_profile: "Normal", mirror_selection: "ClaudeMd", has_ui derived from the detected project type) and surfaces the result on the same response:
started_generation_id— non-null on success; the generation id you can poll viawait_for_generation/get_generation.auto_start_failure: {code, message}— non-null when auto-start failed (quota exceeded, validation error, transient provider failure, etc.). The interview turn still succeeded; callstart_generationmanually with theintake_artifact_idif you want to retry the kickoff with custom settings.
Both fields stay null when the turn didn't trigger completion. Auto-handoff is restricted to user-actor interviews; API-key actors receive auto_start_failure.code: "AUTO_START_NOT_SUPPORTED_FOR_ACTOR_TYPE" and call start_generation themselves.
The auto-handoff fields land on the snapshot returned via get_interview_turn_status when the async job's completion produced an intake artifact.
Errors — when an idempotency replay finds the original is still processing, you get INTERVIEW_TURN_IN_FLIGHT with data: {retryable: true, retry_after_seconds: 5, turn_committed: false, ...}. When the original failed, you get the cached error code with data: {retryable, turn_committed: false, original_error_code, replayed_from_cache: true, ...}. See errors.
get_interview_turn_status
Status poll for an async submit_interview_turn job. Returns the job's current state plus (when completed) the canonical interview snapshot, or (when failed) structured error fields.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
job_id |
UUID | yes | The job_id returned by an async submit_interview_turn call. |
Returns — {status, job_id, interview_id, snapshot?, error_code?, error_message?, is_retryable?, created_at, completed_at?} where status is one of queued, running, completed, failed. When completed, snapshot carries the full interview state in the same shape sync submit_interview_turn returns. When failed, the error_code is one of the standard interview-turn codes (INTERVIEW_TURN_TIMEOUT, INTERVIEW_TURN_TRANSPORT_ERROR, INTERVIEW_TURN_STUCK_RUNNING, INTERVIEW_TURN_INTERNAL_ERROR, …) and is_retryable tells you whether re-submitting with the same client_request_id is safe.
Foreign job ids return a "not found" error (same info-hiding convention as get_interview).
cancel_interview_turn
Added 2026-05-18.
Cancels a background submit_interview_turn(mode: 'async') job by id. Useful when the user's submitted turn was wrong, when an LLM call is dragging on, or when the caller wants to abandon a half-finished turn rather than wait for it (or its stuck-job timeout). Queued jobs cancel cleanly; running jobs cancel best-effort — the job's terminal status will be cancelled, but the agent reply MAY still appear in the interview transcript if a mid-pipeline SaveChanges committed before the cancel landed. Idempotent on already-Cancelled jobs.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
job_id |
UUID | yes | The job_id returned by an async submit_interview_turn call. |
Returns — {status, job_id, interview_id, created_at, completed_at?} where status is cancelled on the happy path. Mirrors the shape get_interview_turn_status returns (no snapshot field — the work was abandoned).
Returns a INTERVIEW_TURN_NOT_CANCELLABLE conflict when the job is already completed or failed (the work landed; the result is at get_interview_turn_status). Foreign job ids return a "not found" error (same info-hiding convention as get_interview_turn_status).
list_interviews
Lists the caller's interviews, newest first. Empty conversations (< 2 turns) are filtered out so abandoned-at-first-contact rows don't clutter the list.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
status |
string | no | Comma-separated. One or more of active, paused, abandoned, complete, awaiting_clarification. |
limit |
int | no | Default 20, max 100. |
Returns — {interviews: [...]} where each item has:
| Field | Type | Description |
|---|---|---|
id |
UUID | Interview id. |
status |
string | One of the lowercase status values above. |
detected_type |
string | null | Project type inferred from the first user turn. |
display_title |
string | Short human-readable label for the interview. |
turn_count |
int | Total turns recorded so far. |
started_at |
ISO-8601 | When the interview was created. |
last_activity_at |
ISO-8601 | Timestamp of the most recent turn or state change. |
get_interview
Returns the full state and transcript of an interview by id. Takes interview_id. Same auth boundary as REST: foreign callers get "not found" rather than a 403, so probing foreign ids is impossible.
The response carries a transcript_size introspection block (added in v0.18, 2026-05-22) — byte-identical to the REST shape — so MCP clients can observe how full a transcript is before queuing the next turn: { chars, tokens_estimate, max_chars, max_tokens, percent_used }. chars sums the UTF-16 length of every user + agent turn (system prompts and reference documents are excluded); tokens_estimate is chars / 4 (conservative upper bound). max_chars and max_tokens report the current platform ceiling but are not enforced in v0.18 — a later release will reject submit-turn calls that would exceed them with a structured error envelope.
delete_interview
Soft-deletes an interview by id. Takes interview_id. Allowed in any status (Active, Paused, Complete, Abandoned, AwaitingClarification, ClarificationResolved) — soft-delete is a "remove from my workspace" affordance, not a state-machine transition. Idempotent on already-deleted rows. The interview row stays in the database for audit + recovery; the conversation drops out of list_interviews and the workspace UI. Foreign callers get "not found" so foreign ids can't be probed. Returns {interview_id, action: "deleted"}. Sister tool: restore_interview.
restore_interview
Restores a soft-deleted interview by id. Takes interview_id. Idempotent on already-live rows. No state guard. Sister to delete_interview. Returns {interview_id, action: "restored"}.
list_intake_artifacts
Added 2026-05-08.
Lists the caller's intake artifacts (the structured output of a completed Interview, the sole input to start_generation). Sibling-shape to list_interviews; agents pick a ready-to-generate artifact without filtering interview status inline. Optional status filter ("ready" is the only meaningful value today; null/blank = same as "ready"; unknown labels return an empty list). Optional limit (default 50, max 200) and offset for pagination. Returns {artifacts: [{id, interview_id, project_name, schema_version, completed_at}, ...]}, newest first. Mirrors REST GET /v1/intake-artifacts. Use the returned id as the intake_id argument to start_generation.
get_intake_artifact
Added 2026-05-12.
Fetches a single intake artifact by id. Takes intake_artifact_id. Returns {id, interview_id, content_warning, payload_content_type, payload, project_attributes} — the full structured intake JSON the orchestrator feeds into start_generation, plus the project-attribute flags set by the post-interview attribute-detection pass (has_ui, has_persisted_data, has_ai_features, has_backend, requires_i18n, requires_compliance, compliance_frameworks).
The payload is user-authored JSON; it ships inside an untrusted_text envelope with a content_warning so MCP clients don't treat the strings as agent instructions. Owner-scoped — foreign and unknown ids surface as "not found" rather than 403, so probing is impossible. Use this when an agent wants to inspect or debug what an interview produced before calling start_generation, or to investigate a "why did this generation produce that" question after the fact.
External-connector tools
Added 2026-05-15.
MCP-driven flow for attaching a OneDrive / SharePoint / Google Drive folder to one of the caller's interviews. The MCP client itself is a CLI — it can't render a folder picker — so the kickoff tool returns a one-time launch URL the user opens in their default browser. The browser handles the existing provider-pick + OAuth + folder-pick + first-sync flow; the MCP client polls a sibling tool for terminal state. Same pattern as start_generation + get_generation. The synced files materialize as reference documents on the interview, identical to what the Web UI's "Connect a folder" affordance produces.
attach_external_folder
Creates an attach session and returns a launch URL. The MCP client opens the URL (or prints it for the user) and then polls get_attach_external_folder_session until the session reaches a terminal state.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
interview_id |
UUID | yes | The interview the resulting connector's files will sync into. Caller must own the interview; foreign ids surface as "not found" rather than 403. |
Returns
| Field | Type | Description |
|---|---|---|
attach_session_id |
UUID | Session id. Pass to get_attach_external_folder_session to poll for status. |
launch_url |
string | Absolute URL the user opens in their browser (e.g., https://specstep.com/external-connectors/attach/<id>). |
status |
string | Initial status — always awaiting_provider_pick on a fresh kickoff. |
expires_at |
ISO-8601 | UTC timestamp the session expires (30 minutes after creation). |
message |
string | Human-readable prompt the MCP client surfaces to the user. |
get_attach_external_folder_session
Polls the state of an attach session. Same auth boundary as the kickoff tool — cross-user reads surface as "not found".
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
attach_session_id |
UUID | yes | The session id returned by attach_external_folder. |
Returns
| Field | Type | Description |
|---|---|---|
status |
string | One of awaiting_provider_pick, awaiting_oauth, awaiting_folder_pick, syncing, completed, expired, cancelled, failed. |
connector_id |
UUID | null | Populated when status = completed. The new (or reused) ExternalConnector id. |
provider |
string | null | Populated once the user picks a provider in the browser. One of onedrive, sharepoint, googledrive, dropbox. |
folder_name |
string | null | Populated on or after commit. The folder the user selected. |
files_synced |
int | null | Populated when status = completed. Count of files materialized as reference documents on the interview. |
error_code |
string | null | Populated when status = failed (e.g., commit_failed, authorize_failed). |
error_description |
string | null | Human description of the failure when status = failed. |
expires_at |
ISO-8601 | null | UTC timestamp the session was set to expire. |
Terminal states are completed, failed, expired, and cancelled. Unknown / expired session ids return a synthetic {"status": "expired"} response — the client can re-run attach_external_folder to start over.
Generation tools
start_generation
Starts a generation from a completed interview's intake. Takes the intake_id and (optionally) the review profile, project type, and version pins. Returns the generation id and initial state Queued. Subject to the same 5-kickoffs-per-minute rate limit as POST /v1/generations.
Many callers won't need to call this directly — when the agent signals completion on a
submit_interview_turncall, SpecStep auto-starts a generation with sensible defaults and surfaces the new generation id asstarted_generation_idon the response snapshot. Callstart_generationexplicitly when you want non-default settings (customreview_profile,mirror_selection, etc.) or when the auto-start surfaced anauto_start_failureyou need to retry past.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
intake_id |
UUID | yes | The intake artifact produced by completing an interview. |
review_profile |
string | no | One of Fast, Normal, Extensive, Researcher. Defaults to Normal. |
project_type |
string | no | One of WebApp, MobileApp, MobileGame, DesktopApp, BrowserExtension, AiAgent, AiTool. Defaults to WebApp. |
has_ui |
bool | no | Whether the project has a user interface. Defaults to false. |
schema_version |
string | no | Pins the manifest schema version. Defaults to 1.0.0. |
rubric_version |
string | no | Pins the review rubric version. Defaults to 1.0.0. |
quality_rubric_version |
string | no | Pins the quality rubric version. Defaults to quality-1.0. |
mirror_selection |
string | no | One of None, ClaudeMd, CursorRules, Copilot, All. Defaults to None. |
Returns
| Field | Type | Description |
|---|---|---|
id |
UUID | The new generation's id. |
state |
string | Initial state, normally Queued. |
download_url |
string | null | Populated only if the package is synchronously ready (rare). |
package_id |
UUID | null | Populated only if the package is synchronously ready. |
get_generation
Breaking change in v0.9.5 (2026-05-06). This tool was previously called
get_status. Callers using the old name must switch — the dispatcher rejectsget_statuswith aMethodNotFound-style error.
Returns the current state of a generation. Takes a generation ID. Returns the state (one of Queued, Drafting, SpecialistReview, Reviewing, FreshEyes, RiskReview, SecurityReview, Assembling, Refining, Delivering, Paused, PausedAwaitingClarification, Complete, Failed, Cancelled, AddendumRunning), the current round, the running cost, the computed progress_percent, and the typed failure_category when the generation failed.
When the historical sample is large enough, the response also carries estimated_total_usd plus estimated_total_p25_usd / estimated_total_p75_usd / estimated_total_sample_size — the same forecast envelope the Generation Details page renders.
The response also carries project_name, description, kind ("specification"), and kind_label so the agent knows what the generation is about and can disambiguate the deliverable from runnable code.
Poll this until state is terminal — or use wait_for_generation instead, which returns a polling-cadence hint.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
generation_id |
UUID | yes | The generation to inspect. |
Returns (shared with wait_for_generation)
| Field | Type | Description |
|---|---|---|
id / generation_id |
UUID | The generation's id. |
state |
string | One of Queued, Drafting, SpecialistReview, Reviewing, FreshEyes, RiskReview, SecurityReview, Assembling, Refining, Delivering, Paused, PausedAwaitingClarification, Complete, Failed, Cancelled, AddendumRunning. |
current_round |
int | Current review-loop round number. |
progress_percent |
int | Computed 0–100 progress signal. |
running_cost_usd |
decimal | Live cost so far. Settles to the package's total_cost_usd on Complete. |
estimated_total_usd |
decimal | null | Historical-median forecast; null when the sample is too small. From 2026-05-27, on a run that has auto-resumed after host restarts the forecast is widened by host_restart_resume_count (each resume re-runs work), so a resume-prone run's estimate reflects the extra cost instead of reading wildly low against the actual. |
estimated_total_p25_usd / estimated_total_p75_usd |
decimal | null | Percentile bounds; null when the forecast is null. Widened on resumed runs alongside estimated_total_usd. |
estimated_total_sample_size |
int | null | Number of historical generations behind the forecast. |
project_name |
string | null | Display name (override or auto-extracted). |
description |
string | null | Short intake-derived description, truncated to 280 chars. |
kind / kind_label |
string | "specification" + the canonical disambiguation copy. |
failure_category |
string | null | Typed failure category on Failed rows; null otherwise. See REST errors page. |
failure_reason |
string | null | Sanitized human-readable hint on Failed rows. |
billing_state |
string | null | Added 2026-05-18. One of NotStarted / Active / PausedRetrying / Complete / PausedAwaitingInput (the last added 2026-06-01 — a human-input pause, e.g. answering a clarification: your turn, no cost climbing, nothing stuck; distinct from PausedRetrying, a transient-error backoff). Customer-facing billing posture written atomically with every state transition. When billing_state is Active while running_cost_usd climbs, the caller knows their cost isn't being wasted — the platform is actively working. Null on pre-2026-05-18 generations (no projection row yet). |
started_work_at |
ISO-8601 | null | Added 2026-05-18. When the dispatcher first claimed the generation (distinct from started_at which is the queued-at time). Null on pre-2026-05-18 generations and while the row is still in pre-work states. |
phase_detail |
string | null | Added 2026-05-18. Human-readable phase label derived pure-function from state + current_round (examples: "Drafting", "Specialist review (round 2)", "Awaiting your clarification"). Present on every projection row. Null on pre-2026-05-18 generations. |
progress_explanation |
string | null | Added 2026-05-18. One-sentence explanation of what's happening at the current progress_percent (e.g., "Specialists are reviewing the draft in parallel"). Closes the same understanding gap as billing_state — the customer sees WHY the progress bar is where it is, not just the number. Null on pre-2026-05-18 generations. |
estimated_duration_seconds |
number | null | Added 2026-05-18. Historical-median forecast of the run's eventual total wall-clock duration (seconds), keyed by review_profile. Null when the historical sample is too small for a confident forecast (the floor is 5 completed generations in the rolling 30-day window) or on pre-2026-05-18 generations. |
estimated_time_remaining_seconds |
number | null | Added 2026-05-18. Best-effort "expected remaining" computed as estimated_duration_seconds - elapsed_since_started_work_at, floored at 0. Null while the generation is queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to estimating…). |
estimated_completion_at |
ISO-8601 | null | Added 2026-05-18. Best-effort wall-clock expected completion: started_work_at + estimated_duration_seconds. Null while queued, terminal, when the forecast is unavailable, or when a still-running generation has already outrun its forecast (the ETA resets to estimating…). |
active_specialist |
string | null | Added 2026-05-18. During SpecialistReview only — slug of the most-recently-completed specialist in the current round (codd / halo / tally / vera / trip / merlin / polo). A pragmatic single-value summary of a parallel fan-out. Null outside SpecialistReview, when no specialists have completed yet, or on pre-2026-05-18 generations. |
retry_count |
int | Added 2026-05-19. Number of recoverable LLM-provider retries fired during this run (rate-limit / transient 5xx / timeout backoffs). Starts at 0 and only increments mid-run — never decreases. Resets to 0 on a host-restart rewind because the counter belongs to a single dispatch attempt. Tells callers apart "healthy first attempt" (0) from "currently riding out a transient hiccup" (>0). Always present (defaults to 0 on pre-rollout generations). |
last_retry_at |
ISO-8601 | null | Added 2026-05-19. UTC timestamp of the most recent retry attempt. Null until the first retry fires. |
next_retry_at |
ISO-8601 | null | Added 2026-05-19. UTC timestamp the retry policy is currently waiting for before the next attempt (last_retry_at + backoff_delay). Null between retries. Lets callers display "next retry in X seconds" without guessing the backoff curve. |
recoverable_error_category |
string | null | Added 2026-05-19. Typed classifier for the recoverable failure that triggered the most recent retry. One of rate_limit / provider_timeout / provider_server_error / schema_violation / other. Distinct from terminal failure_category — that's set when the run fails for good; this is set when an LLM call temporarily failed but the retry policy is still covering it. Null when no retry has fired yet. |
host_restart_resume_count |
int | Added 2026-05-27. How many times this run was automatically resumed after a host restart (capped at 5). Distinct from retry_count: that one is provider-level and resets to 0 on a host-restart rewind, so a run that recovered from several restarts still reads retry_count: 0; this counter spans the run's whole life and only climbs. A non-zero value is the honest reason a run's running_cost_usd or estimated_total_usd runs higher than the clean-run forecast — each resume re-runs work: the full-rewind path re-runs Drafting from scratch, while cheaper in-place resumes pick up from a saved checkpoint. Always present (defaults to 0). |
refinement_summary |
object | null | Added 2026-05-29. Outcome of the pre-delivery refinement pass that fills referenced-but-missing docs before a package ships. null when the pass didn't run, made no change, and left no gap. When present, an object with: rounds_used (int — how many detect → refine → re-validate rounds ran); generated_count / dropped_count / residual_count (int); generated and dropped (arrays of {path, referenced_by[]} — docs filled with real content vs. dangling references removed); residual (array of {path, referenced_by[], reason} — references that ship as deferred stubs, i.e. the package's known gaps); and summary (a ready-to-render string). Mirrors the "Pre-delivery refinements" section in handoff.md. |
reconciliation_summary |
object | null | Added 2026-05-29. Outcome of the pre-delivery contradiction-reconciliation pass that resolves cross-document architecture contradictions (e.g. one doc says PostgreSQL, another DynamoDB) before a package ships. null when the pass found nothing to reconcile and left no residual. When present, an object with: rounds_used (int — how many detect → reconcile → re-validate rounds ran); reconciled_count / unresolved_count (int); reconciled (array of {category, summary, affected_locations[]} — contradictions resolved by redrafting the affected docs to agree); unresolved (array of {category, summary, affected_locations[], reason} — contradictions that ship as known gaps, with the reason); and summary (a ready-to-render string). A reconciled contradiction also disappears from consistency_findings. Mirrors the "Pre-delivery reconciliation" section in handoff.md. |
blocker_resolution_summary |
object | null | Added 2026-05-29. Outcome of the pre-delivery blocker resolve-or-clarify pass that acts on residual Critic-flagged blockers before a package ships. null when there were no residual blockers to act on. When present, an object with: resolved_count / clarified_count / residual_count (int); resolved (array of {target_section, summary} — blockers cleared by redrafting); clarified (array of {target_section, summary, question} — blockers escalated into a clarification question); residual (array of {target_section, summary, reason} — blockers that ship as known gaps); and summary (a ready-to-render string). Mirrors the "Pre-delivery blocker resolution" section in handoff.md. |
refinement_audit |
object | null | Added 2026-05-31. Consolidated audit of the whole pre-delivery refinement pipeline — one flat view of what it auto-fixed versus escalated, aggregated from the three fields above (refinement_summary / reconciliation_summary / blocker_resolution_summary) so you don't have to union three differently-shaped objects. null on a clean run where every refinement pass was a no-op. When present, an object with: auto_fixed_count / escalated_count (int); auto_fixed (the pipeline changed the package) and escalated (the pipeline surfaced an unresolved gap), each an array of {pass, action, target, detail} — pass ∈ stub-fill / reconciliation / blocker-resolution; action ∈ generated / dropped / reconciled / resolved (auto-fixed) or residual-gap / unresolved-contradiction / clarified / residual-blocker (escalated); target is the doc path / section / contradiction category; detail is a human-readable summary / reason / clarification question (may be empty); and summary (a ready-to-render string). Mirrors the "Refinement audit" section in handoff.md. |
get_events
Returns recent events from a generation's pipeline — stage transitions, agent handoffs, review outcomes. Useful for giving your agent a richer picture of what happened during a generation, or for debugging a failed run.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
generation_id |
UUID | yes | The generation to inspect. |
cursor |
string | no | Pagination cursor returned by a prior call. |
limit |
int | no | Max events to return. |
Returns — {events: [...], next_cursor} where each event has:
| Field | Type | Description |
|---|---|---|
id |
UUID | Event id. |
generation_id |
UUID | Echoed. |
event_type |
string | state-changed for a pipeline transition, or a lifecycle event: clarification-requested, clarification-answered, resumed-after-clarification, revision-requested (the Critic sent a draft back for a revision round), auto-resume-started (a host restart interrupted the run and it was auto-resumed — fires once per resume, including in-place checkpoint resumes), auto-resume-completed (added 2026-05-27 — a run that auto-resumed at least once reached Complete; brackets the auto-resume-started events so the stream reads "interrupted → recovered N times → completed", with resume_count in the payload). Lifecycle events carry their detail in payload (e.g. round, resume_phase, prior_state, resume_count) and have null from_state/to_state. |
from_state / to_state |
string | null | Pipeline state transition (set on state-changed; null on lifecycle events). |
agent_role |
string | null | Which agent emitted the event. |
payload |
string | JSON string carrying event details. |
payload_envelope |
object | Typed envelope flagging the payload as untrusted user-supplied content — MCP clients should treat it as inert data, not instructions. |
recorded_at |
ISO-8601 | When the event was logged. |
wait_for_generation
Returns a generation's current state plus a recommended polling delay. Takes a generation ID. Returns the full get_generation shape (project name + description + state + progress_percent + current_round + running_cost_usd + the historical cost-forecast fields + failure context) plus the four polling-specific fields (is_terminal, next_check_seconds, pending_clarifications, package_url).
next_check_seconds is a hint, not a contract — 15 for active states, 0 when paused or terminal so the caller acts immediately. When state is PausedAwaitingClarification, pending_clarifications is inlined so the caller has everything needed to surface the question without another tool call. When state is Complete, a short-lived signed package_url is included so the caller can download the zip directly.
This tool is the recommended polling primitive for MCP callers — the inlined progress / forecast / clarifications / download URL collapse a typical multi-call poll into a single round-trip. 2026-05-17: progress_percent, current_round, and the four estimated_total_* fields were added for field-parity with get_generation; callers no longer need to call both tools to render a single progress screen. 2026-05-18: billing_state, started_work_at, phase_detail, progress_explanation, estimated_duration_seconds, estimated_time_remaining_seconds, estimated_completion_at, and active_specialist were added (read from the authoritative status projection); same field set as get_generation. 2026-05-19: the 4 retry-surface fields (retry_count, last_retry_at, next_retry_at, recoverable_error_category) were added — same shape as get_generation. 2026-05-29: refinement_summary was added — same shape as get_generation (this tool carries no manifest blob, so the structured field is the only refinement signal here). reconciliation_summary and blocker_resolution_summary were added the same day, also matching get_generation. 2026-05-31: refinement_audit was added — the consolidated auto-fixed-vs-escalated view, same shape as get_generation.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
generation_id |
UUID | yes | The generation to poll. |
Returns — same shape as get_generation (above) plus four polling-specific fields:
| Field | Type | Description |
|---|---|---|
is_terminal |
bool | true when state is Complete, Failed, or Cancelled. |
next_check_seconds |
int | Hint, not contract. 15 for active states; 0 when paused or terminal. |
pending_clarifications |
array | null | Inlined when state is PausedAwaitingClarification — same shape as get_pending_clarifications. |
package_url |
string | null | Short-lived signed URL when state is Complete. |
estimate_generation_cost
Added 2026-05-12.
Forecasts what a generation will cost (USD) before calling start_generation. Takes profile — one of Fast, Normal, Extensive, or Researcher. Returns {profile, has_forecast, estimated_total_usd, p25_usd, p75_usd, sample_size, note} — the rolling 30-day median across completed generations for the requested profile, with p25 / p75 confidence bounds and the sample size behind the estimate.
The forecaster is profile-keyed only — it doesn't yet take an intake_id, so the estimate reflects "what this profile usually costs" rather than a per-intake projection. Per-intake variance can be substantial; the p25 / p75 bounds capture that envelope. When the historical sample is below the forecaster's floor, has_forecast is false and the response carries a "not enough data" note rather than a low-confidence number. Useful for sanity-checking cost before kicking off Normal or Extensive runs.
validate_generation_request
Added 2026-05-16.
Dry-run companion to start_generation. Takes the same arguments (intake_id required; project_type, has_ui, review_profile, schema_version, rubric_version, quality_rubric_version, mirror_selection optional with the same defaults). Runs the side-effect-free pre-flight checks the live tool does (intake-existence + ownership, account-approval gate, monthly quota + Extra Usage fallback, review-profile-vs-tier, External Connectors tier gate) WITHOUT enqueueing a generation. Returns {is_valid, blocking_errors: [{code, message}], warnings: [{code, message}]}.
Each error's code matches the exception code start_generation would throw on the live path, so callers can branch on stable identifiers:
INTAKE_NOT_FOUND— intake doesn't exist or caller lacks accessUSER_PENDING_APPROVAL— account hasn't been approved yetQUOTA_EXCEEDED— monthly quota reached + no Extra Usage rescueEXTRA_USAGE_INSUFFICIENT— monthly quota reached + Extra Usage balance below the p75 forecast chargePROFILE_NOT_ALLOWED— requestedreview_profileisn't available on the caller's tierFEATURE_NOT_ALLOWED— intake uses External Connector data but the caller's tier doesn't allow it
Warnings are informational and don't fail validation:
EXTRA_USAGE_WILL_BE_RESERVED— monthly quota exhausted but Extra Usage covers the next callCONCURRENCY_AT_CAP/CONCURRENCY_HIGH— concurrency slots heavily in use; a live call right now could race toCONCURRENCY_CAP_REACHED
The concurrency-race caveat: a dry-run that returns is_valid: true can still 409 on a real call if another kickoff lands first. Concurrency state is informational only.
get_security_findings
Added 2026-05-16.
Returns the structured security-review findings for a generation. Takes generation_id. Returns {generation_id, has_review, finding_count, max_severity, findings: [{severity, surface, topic, title}, ...]}.
Severity values: Critical, Major, Minor, Info, None. Surface values: Spec, ReferenceCode, GeneratedPackage, PromptInjection. has_review is false when the generation has no manifest yet (still in flight) or the review profile didn't include the Security Expert. Use this to gate automation on a generation's security posture without parsing the markdown report — e.g., max_severity == "Critical" → block. The full report markdown stays in the package zip; this tool exposes only the compact structured projection that already lives in the manifest.
get_generation_quality_report
Added 2026-05-16.
Aggregates the four non-security review sections from the generation's manifest into a single structured payload: reliability (Atlas), accessibility (Halo), cost (Tally), risk (Hazard). Takes generation_id. Returns {generation_id, reliability, accessibility, cost, risk} where each sub-section is {has_review, finding_count, max_severity, findings: [{severity, topic, title}, ...]}.
Severity values match get_security_findings. has_review: false on a sub-section means the reviewer wasn't part of the generation's review profile (e.g., the Fast profile skips Cost + Risk). Callers can distinguish "no findings + reviewer ran" from "reviewer didn't run" — useful for PR-gate automation that wants to know whether a quality signal is missing vs known-clean. Pair with get_security_findings for the security gate.
Clarification tools
get_pending_clarifications
Returns the structured clarifications a paused generation is waiting on. Takes a generation ID. Returns {state, clarifications} where each clarification has agent, section (may be null), question, why, and proposed_default. Empty array when the generation isn't paused.
The chat-driven web flow asks these questions through the user's interview chat; this tool exposes the same structured surface to MCP callers so an agent can prompt its user without parsing free-text agent turns.
answer_clarifications
Submits answers to a paused generation's clarifications and resumes the run. Takes the generation ID and an array of {question, answer} pairs. Match each question exactly to the verbatim text from get_pending_clarifications — pairing is by question text. Answers must cover every pending clarification (all-or-nothing for v1).
Returns {generation_id, accepted, message}. The orchestrator picks the run back up on the next dispatcher tick and threads the answers into the next agent call.
Generation control
list_generations
Lists the caller's generations regardless of whether they have a Package row yet, so callers can see in-progress / failed / paused / cancelled runs alongside completed ones.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
status |
string | no | Comma-separated. Roll-up tokens (in_progress, complete, failed, cancelled, paused) or exact state names (Drafting, Reviewing, etc.). Case-insensitive. |
limit |
int | no | Default 50, max 200. |
offset |
int | no | Default 0. |
order |
string | no | desc (newest-first, default) or asc. |
Returns — {rows: [...]} where each row has: id, short_id, project_name, state, review_profile, cost_usd, started_at, completed_at, failed_at, current_round, failure_reason, failure_category, source_channel, interview_id, progress_percent (the row's live 0–100 progress; null on pre-rollout generations that have no projection yet).
delete_generation
Soft-deletes a generation by id. Takes generation_id. Only allowed on terminal-state rows (Complete, Failed, Cancelled); attempting to delete an in-flight generation throws an error — cancel the run first. Idempotent on already-deleted rows. The generation drops out of list_generations and the workspace; the row stays in the database for audit. Returns {generation_id, action: "deleted"}. Sister tool: restore_generation.
restore_generation
Restores a soft-deleted generation by id. Takes generation_id. No state guard — even if the generation was Failed or Cancelled at delete time, restore returns it to your workspace in the same state. Idempotent on already-live rows. Sister to delete_generation. Returns {generation_id, action: "restored"}.
cancel_generation
Added 2026-05-08.
Cancels an in-flight generation. Takes generation_id and an optional reason string. Marks the row Cancelled (a distinct terminal state from Failed) and signals the orchestrator's CancellationToken so any in-flight LLM call halts instead of running to completion (avoiding cost on a run you no longer want). Already-terminal rows return an error — cancel_generation is a no-op on Complete, Failed, or already-Cancelled runs. Use this when an agent observes a stuck or runaway generation and wants to bail out cleanly. Returns {generation_id, state: "Cancelled"}.
retry_generation
Added 2026-05-08.
Retries a Failed generation. Takes generation_id. Replays the original kickoff command verbatim — same intake artifact, same review profile, same multimodal context (images + reference docs are re-hydrated from blob storage) — as a brand-new generation row. The original Failed row stays in the database for audit. Only the original owner can retry; cross-user retry is rejected. Returns {original_generation_id, new_generation_id, state, package_id}. Common error codes: RETRY_STATE_INVALID (only Failed rows are retryable), RETRY_RESEARCHER_CHILD (re-fire the parent Researcher run from the original interview), RETRY_ENVELOPE_UNAVAILABLE (legacy row predating the persisted-command feature), RETRY_OWNER_MISMATCH. Quota and approval errors (QUOTA_EXCEEDED, USER_PENDING_APPROVAL) propagate from the underlying handler.
pause_generation
Added 2026-05-08.
Pauses a running generation. Takes generation_id. User-initiated pause, distinct from the orchestrator's automatic PausedAwaitingClarification state (which fires when an agent needs more input — use get_pending_clarifications + answer_clarifications for that flow). The aggregate records the pre-pause state in the event log so a subsequent resume_generation can restore it. Already-terminal rows return a 409-equivalent error. Returns {generation_id, state: "Paused"}. Sister to resume_generation.
resume_generation
Added 2026-05-08.
Resumes a Paused generation back to its pre-pause state. Takes generation_id. Reads the most recent non-Paused to_state from the event log and restores it; the orchestrator picks up where it left off. State must currently be Paused (use get_generation to check); any other state returns a 409-equivalent error. If the event log has no pre-pause state recorded (corrupt history), surfaces the same error. Returns {generation_id, state} where state is the restored pre-pause state. Sister to pause_generation.
update_generation_name
Added 2026-05-08.
Set or clear the user-facing display name on a generation. Takes generation_id and an optional name. Useful for correcting placeholder / null project names on completed generations (e.g., when the auto-extractor returned (unnamed) because the intake JSON was missing a project_name). Pass an empty/whitespace name (or omit it) to clear the override and let the auto-extractor's best guess take over. Returns {generation_id, display_name}. Mirrors REST PATCH /v1/generations/{id}/name.
Capabilities & metadata
get_capabilities
Added 2026-05-08.
Discover schema versions and the enumerable inputs the API accepts so callers can avoid hardcoding magic strings. Takes no arguments. Returns {schema_version, rubric_version, quality_rubric_version, review_profiles, project_types, mirror_selections}. Anonymous-shaped (the values describe the public contract and don't depend on the caller). Use this BEFORE start_generation / start_interview to discover valid review_profile and project_type values; values change only on deploy. Mirrors REST GET /v1/capabilities.
Account & usage
get_subscription
Returns the calling user's subscription tier and current calendar-month quota snapshot. Takes no arguments. Returns {tier, status, current_period_end, quota: {monthly_limit, monthly_used, concurrency_limit, period_reset_at}} — tier is one of Free, Pro, Team; status reflects Stripe's subscription state. Useful before kicking off a generation so the agent can warn the user if they're at or near their monthly quota. Mirrors the subscription field on REST GET /v1/me plus the standalone REST GET /v1/billing/subscription endpoint.
get_usage
Aggregates the caller's LLM cost and token usage over a time window. Mirrors REST GET /v1/usage.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
from |
ISO-8601 timestamp | no | Window start. Default: 30 days ago. |
to |
ISO-8601 timestamp | no | Window end. Default: now. Max window 366 days. |
group_by |
string | no | One of provider, model, role, day, week, month, key, user. Defaults to model. |
Returns
| Field | Type | Description |
|---|---|---|
from / to |
ISO-8601 | Echoed window bounds. |
group_by |
string | Echoed grouping key. |
rows |
array | Each entry has {group, input_tokens, output_tokens, cached_tokens, cost_usd, invocation_count}. |
Package tools
get_latest_package_for_generation
Added 2026-05-08.
Get the current package metadata + a time-limited download URL for a generation by generation_id (rather than by package_id). Use this when an agent has just completed a generation and wants the package without re-querying list_packages. 404-equivalent error when the generation has no package yet (still in flight) or when the package was soft-deleted. Mirrors REST GET /v1/generations/{id}/package. Future-proofs for the package-update flow (multiple package versions per generation): when that lands, this tool returns the CURRENT (latest) package without callers having to filter list_packages.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
generation_id |
UUID | yes | The generation whose latest package to return. |
Returns
| Field | Type | Description |
|---|---|---|
id |
UUID | Package id. |
generation_id |
UUID | Echoed. |
version |
string | Package version (currently always 1.0.0). |
download_url |
string | SAS-tokened blob URL for the package zip. |
download_url_expires_at |
ISO-8601 | When the SAS URL expires. Refetch this tool to get a fresh URL. |
total_cost_usd |
decimal | What the generation's LLM calls cost. |
retention_until |
ISO-8601 | null | When the package will be auto-deleted; null means indefinite. |
deleted_at |
ISO-8601 | null | Set if the package was soft-deleted. |
project_name |
string | null | Display name (override or auto-extracted). |
description |
string | null | Short description, truncated to 280 chars. |
kind / kind_label |
string | "specification" + canonical disambiguation copy. |
list_package_files
Added 2026-05-08.
Lists every file inside a completed package zip, with the uncompressed size of each entry. Takes package_id. Returns {package_id, files: [{path, size_bytes}, ...]} sorted lexicographically by path. Streams the zip's central directory from blob storage via Azure SDK range requests — the full archive is never materialized on the server. Pair with get_package_file to read individual files without the zip download dance. Useful when a coding agent wants to inspect package structure (architecture docs, requirements, ADRs, etc.) and pick which files to read.
get_package_file
Added 2026-05-08.
Returns the bytes of a single file from a package zip. Takes package_id and path (use list_package_files to discover available paths). The response shape depends on the file type:
- Text entries (markdown, YAML, JSON, plain text, CSV, SVG):
{package_id, path, content_type, content}wherecontentis the raw UTF-8 string. - Binary entries (PNG, unknown extensions):
{package_id, path, content_type, content_base64}wherecontent_base64is the base64-encoded payload (the JSON envelope can't carry malformed UTF-8).
Files larger than 256 KB return an error directing the caller at the bulk zip download URL (use get_package). Path-traversal segments (..) are rejected at the application layer. Streams the requested zip entry from blob storage; the full archive is never materialized.
search_package
Added 2026-05-08.
Full-text search across a package's indexed file contents (markdown, YAML, JSON, plain text, CSV, SVG entries — binary files are skipped during indexing). Takes package_id, query, and an optional limit (default 20, max 50). Returns {package_id, query, results: [{file_path, snippet, rank}, ...]} ranked by relevance, newest match first within rank ties. Snippets are HTML-highlighted with <mark>...</mark> markers around match terms; agents can render them directly or strip the tags as preferred.
Query syntax follows Postgres websearch_to_tsquery: quoted phrases ("agent topology"), OR for alternation (auth OR session), -term for exclusion (auth -test). Case-insensitive; English stemming is applied (so searching matches search). An empty query returns an empty result set rather than every row.
Results are scoped to a single package. For cross-package search across every package the caller owns in one round trip, use search_my_packages (below). The index is built at package completion; SpecStep staff can re-trigger indexing on request if it falls out of sync. Mirrors REST GET /v1/packages/{id}/search?q=...&limit=....
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
package_id |
UUID | yes | The package to search. |
query |
string | yes | websearch_to_tsquery syntax (quoted phrases, OR, -term). |
limit |
int | no | Default 20, max 50. |
Returns — {package_id, query, results: [{file_path, snippet, rank}, …]}. snippet contains <mark>...</mark> highlights around match terms.
search_my_packages
Added 2026-05-08.
Cross-package full-text search across every non-deleted package the caller owns. Takes query and an optional limit (default 10, max 25). Returns {query, results: [{package_id, project_name, version, total_hit_count, files: [{file_path, snippet, rank}, ...]}, ...]} — matched packages ordered by their best per-file rank, with up to 5 file hits embedded in each entry. total_hit_count carries the per-package true count so callers can render "showing N of M" or follow up with search_package for a deep look at any single package.
Same query syntax as search_package (Postgres websearch_to_tsquery — quoted phrases, OR, -term). Empty query returns an empty result set.
Replaces the prior N+1 fan-out pattern (call list_packages, then search_package per package). Use this tool whenever you don't already know which package to search. Mirrors REST GET /v1/packages/search?q=...&limit=....
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Same websearch_to_tsquery syntax as search_package. |
limit |
int | no | Default 10, max 25. |
Returns — {query, results: [{package_id, project_name, version, total_hit_count, files: [{file_path, snippet, rank}, …]}, …]}. Packages are ordered by their best per-file rank; up to 5 file hits embedded per package; total_hit_count is the per-package true count.
get_package
Returns the documentation package metadata. Takes a package_id. Read the package_id from start_generation (returned alongside the new generation) or from get_generation once the generation reaches Complete. Includes project_name, description, kind, and kind_label so the deliverable is identifiable + clearly labeled as a specification package, not application code. generation_id is null for packages created by migrating existing documentation rather than by a generation run (Migrate Existing Docs, 2026-05-27); present for generated packages.
preview_doc_migration
Classifies an uploaded documentation archive onto the canonical SpecStep package layout and returns the proposed mapping — no persistence. Takes archive_base64 (a base64-encoded .zip; inline cap ~4 MB — use the REST endpoint POST /v1/doc-migrations/preview for larger) and optional source_archive_name. Returns {source_archive_name, source_byte_count, total_file_count, classified_count, unclassified_count, classifier_version, mapping: [{source_path, doc_type, target_path, layer, confidence}, ...], conflicting_target_paths: [...]}. Run this first; a non-empty conflicting_target_paths means two files claim the same canonical slot — resolve with target_path_overrides on commit.
commit_doc_migration
Normalizes an uploaded documentation archive into a migrated package and persists it (canonical layout + _source/ for unplaceable files + a source: migrated manifest), linking it to a project. Takes archive_base64 (base64 .zip, ~4 MB inline cap), optional source_archive_name, optional project_id (defaults to your default project), optional version (default 1.0.0), and optional target_path_overrides (a map of source-path → target-path corrections from the reviewed preview). Returns {migration_id, package_id, project_id, version, classified_count, unclassified_count}. The resulting package appears in list_packages / get_package with a null generation_id. Errors when two sources still claim one canonical slot — supply target_path_overrides to resolve.
list_packages
Lists documentation packages on your account, with project_name + description + kind annotations on every row so the caller can identify each package without a per-row follow-up. Each row also carries generation_state so callers can tell which packages came from runs that finished cleanly versus runs that failed mid-flight.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
limit |
int | no | Default 50, max 200. |
offset |
int | no | Default 0. |
order |
string | no | desc (newest-first, default) or asc. |
Returns — {packages: [...], next_cursor} where each entry has:
| Field | Type | Description |
|---|---|---|
id |
UUID | Package id. |
generation_id |
UUID | null | Source generation. null for packages created by migrating existing documentation — those have no originating run. |
version |
string | Package version. |
total_cost_usd |
decimal | What the generation cost. |
retention_until |
ISO-8601 | null | When the package will be auto-deleted. |
deleted_at |
ISO-8601 | null | Set if soft-deleted (filtered out by default). |
project_name |
string | null | Display name. |
description |
string | null | Short description, truncated to 280 chars. |
kind / kind_label |
string | "specification" + canonical disambiguation copy. |
generation_state |
string | Final state of the source generation (Complete, Failed, etc.). |
request_change
Added 2026-05-09.
Files a change-management addendum against a completed package. Single-LLM-call flow (~30 seconds, ~$0.40-0.50) that produces a 5-file markdown bundle (background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md) attached as a sibling artifact to the existing package — no version bump.
Use this tool when an agent has a focused single-change request against a completed package — "Add Apple ID OAuth", "Localize French", "Switch session storage from cookies to JWT". For structural rewrites that warrant a fresh package version (~$2.50, multi-agent pipeline), call start_generation off the original interview's intake instead.
The addendum row also writes a bell-dropdown notification under the new AddendumComplete kind so the user sees the change land on their next page load. Mirrors REST POST /v1/packages/{id}/addenda.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
package_id |
UUID | yes | The completed package to file the addendum against. |
title |
string | yes | ≤ 200 chars. Short label for the change. |
description |
string | yes | ≤ 4000 chars. Free-text description of the change requested. |
Returns
| Field | Type | Description |
|---|---|---|
addendum_id |
UUID | The new addendum's id. |
package_id |
UUID | The parent package id (echoed). |
download_url |
string | SAS-tokened blob URL for the 5-file markdown zip; valid for one hour. |
cost_usd |
decimal | What the LLM call cost (typically ~$0.40–0.50). |
list_audiences
Added 2026-05-18.
Public catalog of audiences understood by explain_package. No arguments. Returns {audiences: [{slug, display_name, description}, ...]} — the V1 set is executive, product-manager, engineering-manager, new-engineer, investor, security. Mirrors REST GET /v1/explain/audiences. Use this to populate a picker before calling explain_package, or to validate a slug before submitting.
explain_package
Added 2026-05-18.
Rewrites a completed package as a short audience-tailored markdown explanation. One LLM round-trip (~10 seconds, ~$0.05) for a cold call; subsequent calls for the same (package, audience) pair return the cached row instantly and at zero cost.
Use this when an agent needs to summarize a package for a specific reader — e.g., "give me the executive cut" or "explain this to a new engineer" — instead of streaming the full bundle.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
package_id |
UUID | yes | The package to explain. |
audience |
string | yes | One of the slugs returned by list_audiences. |
Returns
| Field | Type | Description |
|---|---|---|
markdown |
string | Audience-tailored explanation, ≤ 8192 chars. |
audience |
string | Echoed slug. |
model |
string | LLM model id used for generation. |
cost_usd |
decimal | Cost of the LLM call (0 on a cache hit). |
cached |
bool | true when the result was served from a previously-generated row. |
Errors: EXPLAIN_AUDIENCE_UNKNOWN if the slug isn't in the catalog; QUOTA_EXPLAIN_EXCEEDED if the monthly explanation quota is reached for the caller's tier; "not found" if the package isn't owned by the caller. Mirrors REST POST /v1/packages/{id}/explain.
list_packages_for_generation
Added 2026-05-12.
Lists every package produced by a generation. Takes generation_id. Returns {generation_id, packages: [{id, generation_id, version, total_cost_usd, retention_until, deleted_at, addendum_count, addendum_total_cost_usd}, ...]}. Today there is at most one package per generation, but the array shape is forward-compatible with the multi-version-package flow.
Each row carries addendum_count + addendum_total_cost_usd so an agent gets the full package and change-request picture in one call — no chaining get_latest_package_for_generation → list_change_requests → manual cost sum. Owner-scoped — foreign and unknown generation ids surface as "not found." When the generation has no package yet (still in flight or never reached Complete), returns an empty packages array rather than 404 — distinguishes "in flight" from "permission denied."
list_change_requests
Added 2026-05-12.
Lists every change-request addendum filed against a package, newest-first. Takes package_id. Returns {package_id, content_warning, addenda: [{id, title, description, cost_usd, created_at, download_url}, ...]}. Each download_url is a freshly issued SAS-tokened blob URL valid for one hour, pointing at the addendum's 5-file markdown zip.
title and description carry the user's free text from the original request_change call; they ship under a content_warning envelope so MCP clients don't treat them as agent instructions. Owner-scoped — foreign and unknown package ids surface as "not found." Use after request_change to confirm what was filed, or to walk the full change-request history of a package. Mirrors REST GET /v1/packages/{id}/addenda.
get_change_request
Added 2026-05-12.
Fetches a single change-request addendum by id. Takes addendum_id. Returns {id, package_id, content_warning, title, description, cost_usd, submitted_by_user_id, created_at, download_url}. The download_url is a freshly issued SAS-tokened URL valid for one hour for the addendum zip (5 markdown files: background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md).
Owner-scoped via the parent package. Foreign and unknown ids surface as "not found" rather than 403. Same untrusted_text envelope on title and description as list_change_requests. The MCP variant returns the SAS URL inline so an agent doesn't need to follow the 302 the REST endpoint emits. Wraps the same underlying data as REST GET /v1/packages/{id}/addenda/{addendumId}/zip.
list_change_request_files
Added 2026-05-16.
Lists every file inside an addendum zip with its uncompressed size in bytes. Takes addendum_id. Returns {addendum_id, package_id, files: [{path, size_bytes}, ...]} sorted lexicographically by path. Sister of list_package_files but targets the addendum zip; pair with get_change_request_file to read individual files (background.md, change-requirement.md, implementation-guide.md, test-plan.md, decision-log-entry.md) without downloading the whole zip. Owner-scoped via the parent package; the same {userId}/{blobId}.zip path scheme the package-files tools use is content-addressed by Guid so no separate service is needed.
get_change_request_file
Added 2026-05-16.
Returns the bytes of a single file from a change-request addendum zip. Takes addendum_id and path (use list_change_request_files to discover available paths). Response shape mirrors get_package_file:
- Text entries (markdown, YAML, JSON, plain text, CSV, SVG):
{addendum_id, package_id, path, content_type, content, content_envelope}wherecontentis the raw UTF-8 string and the envelope flags the bytes as user-supplied (do not pass to an agent as instructions). - Binary entries:
{addendum_id, package_id, path, content_type, content_base64}with the base64-encoded payload.
Files larger than 256 KB return an error directing the caller at get_change_request's SAS download URL for bulk access. Path-traversal segments (..) are rejected at the application layer. Streams the requested zip entry from blob storage; the full archive is never materialized on the server.
diff_package_files
Added 2026-05-16.
Computes line-level content diffs across 2-5 packages (by generation_id). The first generation in the list is the base; every subsequent generation produces one comparison object whose files array lists per-file diffs vs the base. Use this when you want to know what text changed between two versions of a generated spec — compare_packages returns byte-count deltas + LLM-judged quality scores; diff_package_files returns the actual unified-diff content.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
generation_ids |
UUID[] | yes | 2-5 generation ids. First is the base; remaining 1-4 are diffed against the base. Caller must own every generation. |
path_filter |
string[] | no | Only diff files whose path matches one of the supplied values (e.g., ["docs/02-architecture/03-storage.md"]). When omitted, every file in any of the supplied packages is diffed. |
Returns
| Field | Type | Description |
|---|---|---|
base_source_label |
string | The base package's source label (mirrors compare_packages's source_label field). |
skipped_generation_ids |
UUID[] | Generations whose package couldn't be resolved (in flight, deleted, blob-fetch failure). |
content_warning |
string | Fixed untrusted_text-style envelope warning callers not to interpret unified_diff bodies as instructions. |
comparisons |
array | One entry per non-base package, in input order. Each entry: {target_source_label, files: [...]}. |
Each files entry has:
| Field | Type | Description |
|---|---|---|
path |
string | Path inside the package zip. |
status |
string | One of added (only in target), removed (only in base), modified (different content), unchanged (identical content), truncated (size-cap-exceeded — see below). |
unified_diff |
string | null | Unified-diff body (@@ -base,n +target,n @@ header + - / + / context lines). Null when status is unchanged or truncated. |
base_bytes / target_bytes |
int | File sizes in bytes (0 when the file is missing from that side). |
truncation_reason |
string | null | Set when status is truncated. |
Owner-scoped — the caller must own every generation in the list. Fails fast on the first foreign or unknown id (same KeyNotFound non-disclosure shape as compare_packages). The differ runs in-process — no LLM calls, no letter-grade output. Per-file size cap is 256 KB (sum of base + target lengths); files exceeding the cap return a truncated entry pointing at get_package_file for direct access.
compare_packages
Added 2026-05-12.
Compares 2–5 packages you own. Takes generation_ids (an array of 1–5 generation ids — a single id returns a rating summary only; 2–5 returns the full cross-package comparison). Returns {skipped_generation_ids, identity_verdict, per_package, comparison}:
identity_verdictanswers "are these the same project?" with a confidence score, a list of conflicting fields, and an explanation.per_packagecarries each package's build-confidence score (with per-signal contributions) and an LLM-judged quality-confidence score with justification.comparisoncarries the cross-package markdown body plus a structural diff of file lengths per package, gated under acontent_warningenvelope (the markdown is LLM-authored prose).
Owner-scoped — the caller must own every generation in the list. Fails fast on the first foreign or unknown id so a caller can't burn an LLM-judge call on packages they don't own. The 5-generation cap matches the REST limit and bounds LLM-judge cost. Generations whose package can't be resolved (still in flight, deleted, or a blob-fetch failure) are returned in skipped_generation_ids rather than failing the whole call. Useful when an agent wants to evaluate "which of my packages is best" or "how does my latest run compare to the previous one."
Async is the default (changed 2026-05-19). A real 2–5 package compare runs an LLM-judge pass that typically takes 30–80s — longer than most MCP clients' request timeout. So compare_packages defaults to mode: "async": it enqueues a background job and returns {status: "queued", job_id} within milliseconds. Poll get_compare_packages_status with that job_id for the canonical result. Pass mode: "sync" only when you know the compare fits inside your client's timeout (a single-package rating summary, or two small packages).
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
generation_ids |
UUID[] | yes | 1–5 generation ids. One id returns a rating summary only; 2–5 returns the full cross-package comparison. |
mode |
string | no | async (default) — enqueue a job + return job_id to poll; sync — run inline and return the full result (small compares only, else the MCP client times out). |
Returns
In async mode: {status, job_id} — poll get_compare_packages_status(job_id). In sync mode (and as the result payload of a completed async job):
| Field | Type | Description |
|---|---|---|
skipped_generation_ids |
UUID[] | Generations whose package couldn't be resolved (in flight, deleted, blob-fetch failure). |
identity_verdict |
object | {same_project, confidence, conflicting_fields, explanation} — answers "are these the same project?". |
per_package |
array | One entry per resolved package with {generation_id, build_confidence: {score, signals: [...]}, quality_confidence: {score, justification}}. |
comparison |
object | null | When ≥ 2 packages resolve: {content_warning, markdown_body, file_length_diff}. The markdown is LLM-authored prose under an untrusted_text envelope. |
get_compare_packages_status
Added 2026-05-19 — the poller for
compare_packages(mode: "async").
Fetches the status of a background compare job. Takes the job_id returned by an async compare_packages call. Owner-scoped — only the user who enqueued the job can poll it.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
job_id |
UUID | yes | The job_id from compare_packages(mode: "async"). |
Returns
| Field | Type | Description |
|---|---|---|
status |
string | queued, running, completed, or failed. |
result |
object | null | Present when status is completed — the same shape as a sync compare_packages result (above). |
error_code / error_message / is_retryable |
string / string / bool | Present when status is failed. is_retryable tells you whether to re-enqueue. |
Poll on a gentle cadence (2–5s) until status is completed or failed. A 2–5 package compare usually resolves in 30–80s.
estimate_change_request_cost
Added 2026-05-12.
Forecasts what a single request_change addendum will cost (USD). Takes no arguments. Returns {has_forecast, estimated_total_usd, p25_usd, p75_usd, sample_size, note} — the rolling 30-day median across completed addenda with p25 / p75 confidence bounds, or a "not enough data" envelope when the sample is below the forecaster's floor.
No profile dimension — every addendum uses the same prompt and model today, so the forecast is a single global median. The p25 / p75 bounds capture per-addendum variance (driven mostly by description length and change complexity). Symmetric with estimate_generation_cost; useful before calling request_change when cost matters.
update_package
The all-in-one mutation tool for packages. Folds three operations into one call (the MCP transport doesn't have a natural HTTP-verb equivalent of DELETE or PATCH, so the operation is encoded as a flag).
Takes package_id plus exactly one of:
retention_until: <date-time | null>— set or clear the package's retention deadline. Pass an ISO-8601 timestamp to extend retention; passnullto make retention indefinite.delete: true— soft-delete the package. Idempotent. The package row drops out oflist_packagesbut stays in the database for audit + recovery.restore: true— restore a soft-deleted package. Idempotent on already-live rows. Sister operation todelete: true.
Returns {package_id, action: "deleted" | "restored" | "retention_updated"}. Passing both delete: true and restore: true returns an error.
Why bundled instead of separate
delete_package/restore_packagetools? Package was the first entity to expose multi-operation mutations through MCP, and bundling them into one tool kept the manifest small. Newer entities (Interview, Generation) use dedicateddelete_*/restore_*tools; both styles work.
Support tools
submit_bug_report
Submits a bug report tied to the calling user. Takes title, description, optional severity, optional related_generation_id, optional current_route, optional caller-supplied user_agent. Persisted to the caller's report history. Server-side enrichment stamps account name/email/plan, build version, and the AI-tool heuristic.
list_my_bug_reports
Lists the caller's bug reports, newest first.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
limit |
int | no | Default 20, max 200. |
Returns — {bug_reports: [...]} where each entry has:
| Field | Type | Description |
|---|---|---|
id |
UUID | Bug report id. |
title |
string | Short title supplied at submit time. |
severity |
string | Caller-supplied severity (or default if omitted). |
status |
string | Triage state (e.g. Open, Triaged, Closed). |
created_at |
ISO-8601 | When the report was submitted. |
updated_at |
ISO-8601 | Most recent state change. |
related_generation_id |
UUID | null | Generation the report was filed against (if any). |
triage_notes |
string | null | Server-side notes added during review. |
get_bug_report
Returns a bug report by id. Takes bug_report_id. Visible to the submitter; foreign callers get "not found".
Agent-driven feedback acquisition
Added 2026-05-17.
SpecStep tools may include a feedback_recommendation field on their response when the server detects a known quality issue (Otto produced malformed JSON, review loop stalled, etc.). MCP clients (Claude Code, Codex, Cursor, Windsurf) are expected to act on the field by calling submit_feedback with the supplied prefill and the recommendation_token. SpecStep dedups on the token: calling submit_feedback twice with the same token bumps an occurrence counter on the existing row, not a new row.
Response shape when the field is present (today only on get_generation for Failed generations with a quality-signal category — LlmContract, ReviewBudgetExhausted, RedraftNoProgress, ReviewLoopStalled):
{
// ... tool's normal output
"feedback_recommendation": {
"action": "submit_feedback",
"reason": "<one-line rationale>",
"dedup_token": "<sha256 hex>",
"prefill": {
"type": "InterviewQuality | PackageQuality | EndToEndRun | ...",
"title": "<concise summary>",
"full_report": "<server-composed markdown body>",
"severity": "Info | Low | Medium | High | Critical",
"generation_id": "<uuid or null>",
"interview_id": "<uuid or null>"
}
}
}
To file, normalize the enum casing (PascalCase → snake_case for type; lowercase for severity) and call submit_feedback with the prefill verbatim plus recommendation_token.
The field is omitted when the user has disabled this behavior in Settings → Notifications → Agent integrations (default on for new users). Absence-of-field means "do nothing" — never prompt the user to file feedback manually based on this signal.
submit_feedback
Added 2026-05-16. Distinct from
submit_bug_report— feedback evaluates quality (was the interview good, is the package coherent, what's the build confidence). Bug reports are for broken behavior.
Submits structured quality feedback. Required: type (interview_quality, package_quality, end_to_end_run, tooling_experience, api_doc_quality, website_quality, launch_readiness, other), title, full_report (markdown). Optional: target GUIDs (interview_id, intake_artifact_id, generation_id, package_id) — required for run-bound types (interview_quality, package_quality, end_to_end_run). Scalar scores: interview_quality_score, package_quality_score, build_confidence_percent (0-100), letter_grade (A-F). Optional template_id + rubric_version link to a template from list_feedback_templates; pass rubric_section_responses (section-id → free-text) + rubric_scores (section-id → 0-100) to fill the rubric.
Additional optional submitter context (added 2026-05-16): estimated_output_quality (≤50 char qualitative label, distinct from the numeric build_confidence_percent), project_type and review_profile (≤50 chars each — denormalize the run's project type and review profile at submission time), transcript_evidence and package_evidence (arrays of quoted snippets, each ≤2000 chars, supporting the findings).
Each entry in structured_findings accepts three richer fields (each ≤2000 chars): evidence (quoted text from the transcript or package supporting the finding), expected_behavior (what the caller expected to happen), suggested_fix (caller's proposed remediation). Mirrors the specialist-reviewer finding shape so feedback findings + reviewer findings can be aggregated.
Typed evidence (added 2026-05-21): each finding also accepts an optional typed_evidence array (up to 20 items) for machine-readable signal you'd otherwise flatten into prose. Each item is { "kind": <string>, "payload_json": <string ≤4000 chars> }. The kind is one of free, http_response, route, console_error, mcp_tool_call, transcript_turn, screenshot, json_payload, and payload_json must be a well-formed JSON document. Required keys depend on the kind: http_response needs a numeric status; route needs a string url; console_error needs a string message; mcp_tool_call needs a string tool; transcript_turn needs a numeric turnIndex; screenshot needs a string path; free and json_payload accept any well-formed JSON. The prose evidence string and typed_evidence can coexist on the same finding. Read responses echo typed_evidence back in the same shape.
Recurrence threading (added 2026-05-17): pass at most one of recurrence_of_feedback_id or recurrence_of_bug_report_id when filing a row because an earlier feedback or bug report was resolved but the issue came back. Both ids cannot be set on the same submission — the system rejects the call.
Agent-driven dedup (added 2026-05-17): pass recommendation_token when filing in response to a server-emitted feedback_recommendation field (see "Agent-driven feedback acquisition" above). The token (an sha256 hex string) is used to dedup against a 30-day window of open auto-filed rows — a dedup hit bumps an occurrence counter on the existing row instead of creating a new one.
Returns id, type, status, created_at. To avoid spending a submit_feedback call on a validation error, dry-run the shape first with validate_feedback.
validate_feedback
Added 2026-05-19. Pre-flight for
submit_feedback.
Validates a feedback submission shape without persisting anything. Takes the same input as submit_feedback (the recommendation_token is the only field it drops — dedup is a write-time concern), and the same validation rules apply: template, cap, and section-id violations all fail here exactly as they would at submit time. Returns { valid, errors[] }, where each error is { code, message, param_name } carrying the canonical FEEDBACK_* code (FEEDBACK_TITLE_REQUIRED, FEEDBACK_FULL_REPORT_REQUIRED, FEEDBACK_INVALID, FEEDBACK_TEMPLATE_VERSION_REQUIRED, FEEDBACK_TEMPLATE_UNKNOWN, FEEDBACK_TEMPLATE_TYPE_MISMATCH, FEEDBACK_TEMPLATE_SECTION_UNKNOWN, FEEDBACK_TEMPLATE_SCORE_UNKNOWN — see errors).
Run this first when you're uncertain about template section ids or free-text caps — it catches the error without consuming a submit_feedback call.
amend_feedback
(Added 2026-05-21.) Submitter self-correction. While your feedback row is still Open AND within the amend window (10 minutes of submission), fix free-form content in place: feedback_id (required) plus any of title, summary, full_report, transcript_evidence, package_evidence, tags. Omitted fields are left unchanged. NOT amendable: type, severity, target ids, template_id/rubric_version, and structured_findings. Returns the updated id / title / status / updated_at. Errors (surfaced as the tool error message): the row isn't yours, it has already left Open (FEEDBACK_AMEND_NOT_OPEN), or the window has expired (FEEDBACK_AMEND_WINDOW_EXPIRED). Catch a typo right after submit_feedback while the window is still open.
list_my_feedback
Lists the caller's feedback rows newest-first. Takes optional limit (1-200, default 20). Returns id / type / title / severity / status / linked GUIDs / template id + version / triage notes plus checked_at and reviewed_at so a submitter can tell whether the row has been looked at or reviewed yet.
get_feedback
Returns a feedback row by id. Takes feedback_id. Visible to the submitter; foreign callers get "not found".
The output includes the full record: every field set at submit time (including the 2026-05-16 additions — estimated_output_quality, project_type, review_profile, transcript_evidence, package_evidence, plus the richer per-finding evidence / expected_behavior / suggested_fix) and the server-managed lifecycle stamps (checked_at, reviewed_at).
list_feedback_templates
Lists the available code-defined feedback templates (rubrics) so a client can pick one before submitting. Returns id / version / title / description / section_count.
Seven templates ship in v1, each pairing with a FeedbackType:
| Template id | Pairs with type | Scope |
|---|---|---|
end-to-end-specstep-quality v1.0.0 |
end_to_end_run |
One full SpecStep run (interview through generated package) — 13 sections covering interview quality, package coherence, build confidence, letter grade, top blockers, recommended fixes. |
interview-quality v1.0.0 |
interview_quality |
Otto's performance during a single Interview — 7 sections covering pacing, follow-up quality, coverage breadth, rapport, gaps, recommended follow-ups. |
package-buildability v1.0.0 |
package_quality |
Whether a generated package is buildable as-is by an AI coder — 8 sections covering coherence, completeness, AI-coder clarity, edge-case coverage, data-shape ambiguities, effort-estimate accuracy, top risks. |
api-doc-quality v1.0.0 |
api_doc_quality |
The public /api-docs/* surface — 8 sections covering endpoint coverage, completeness, example clarity, error-handling docs, schema clarity, missing sections, recommended improvements. |
tooling-experience v1.0.0 |
tooling_experience |
The SpecStep tooling surfaces — 9 sections covering MCP ergonomics, CLI / IDE integration, error-message clarity, performance, friction points, recommended improvements. |
website-quality v1.0.0 |
website_quality |
The public marketing/docs site at specstep.com — 11 sections covering visual polish, copy quality, SEO + sitemap correctness, route correctness, mobile experience, console cleanliness, content sanitization. |
launch-readiness v1.0.0 |
launch_readiness |
Cross-cutting pre-launch review — 12 sections covering Priority-0 blockers, public content sanitization, trust posture, API + MCP stability, mobile readiness, accessibility, performance, observability, and a final go / no-go recommendation. |
get_feedback_template
Returns one template's full content (all sections + prompts + optional score scales). Takes template_id + version.
Webhook subscription tools
Added 2026-05-12.
The five tools below mirror the REST webhook-management surface (/v1/api-keys/{apiKeyId}/webhooks). They let a cookie-authenticated agent register, rotate, smoke-test, and revoke webhook subscriptions on its own API keys. The mutating tools (create_webhook, rotate_webhook_secret, test_webhook) refuse API-key principals by design — a compromised key must not be able to redirect, silently re-sign, or spam-fire event payloads. list_my_webhooks and delete_webhook are safe from any context (read-only and revocation, respectively). Programmatic callers that have explicitly accepted the redirect risk can use the REST endpoints directly — see REST Step 7.5 for the bearer-callable surface.
list_my_webhooks
Added 2026-05-12.
Lists every webhook subscription attached to a caller-owned API key. Takes api_key_id. Returns {api_key_id, webhooks: [{id, url, events, created_at, updated_at, last_delivery_at, last_delivery_status, last_delivery_http_status, needs_rotation}, ...]}. The signing secret is never returned by list — the plaintext is shown only once, at create or rotate time. needs_rotation flags subscriptions whose secret was issued under a deprecated scheme and should be rotated. Foreign and unknown API-key ids surface as "not found." Mirrors REST GET /v1/api-keys/{apiKeyId}/webhooks.
create_webhook
Added 2026-05-12.
Registers a new webhook subscription against a caller-owned API key.
The signing_secret is returned once in this response — store it before the response is discarded; list_my_webhooks will not return it. If lost, rotate via rotate_webhook_secret. The URL must point to an externally routable host: loopback, link-local, and internal addresses are rejected to prevent SpecStep from being used as a proxy to probe networks on the receiver's side. Unknown event types are rejected with the offending names listed.
Refuses API-key principals — a compromised key must not be able to redirect future event payloads to an attacker-controlled URL. Cookie-authenticated humans register webhooks for their own keys via this tool; programmatic callers can use the REST endpoint with explicit risk acceptance. Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks.
Arguments
| Name | Type | Required | Description |
|---|---|---|---|
api_key_id |
UUID | yes | The caller-owned API key to attach the subscription to. |
url |
string | yes | Absolute https:// URL. Loopback / link-local / internal addresses are rejected. |
events |
string[] | yes | At least one event type — e.g. generation.completed, generation.failed. |
Returns
| Field | Type | Description |
|---|---|---|
id |
UUID | The new subscription's id. |
api_key_id |
UUID | Echoed. |
url / events |
— | Echoed. |
created_at |
ISO-8601 | Creation timestamp. |
signing_secret |
string | Returned once. Use to validate HMAC-SHA256 signatures on delivered payloads. |
signing_secret_note |
string | Reminder: this is the only time the plaintext is returned. |
rotate_webhook_secret
Added 2026-05-12.
Issues a fresh signing secret for an existing webhook subscription. Takes api_key_id and webhook_id. Returns {id, api_key_id, updated_at, signing_secret, signing_secret_note}. The new plaintext is returned once — update every consumer that validates payloads against this subscription's signature before discarding the response.
The old secret is invalidated immediately on the dispatcher side. In-flight deliveries already signed with the old secret may still arrive at the receiver for a brief window — if you can, bracket rotations with a tolerance window on the receiver (accept either signature for a short period after rotation).
Refuses API-key principals — a compromised key rotating the signing secret could silently lock the legitimate owner out of validating subsequent payloads. Cookie-authenticated humans rotate via this tool; programmatic callers go through REST with explicit risk acceptance. Foreign and unknown ids surface as "not found." Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/rotate-secret.
test_webhook
Added 2026-05-12.
Fires a synthetic webhook.test event against a registered subscription and returns the live delivery outcome. Takes api_key_id and webhook_id. Returns {success, http_status, failure_reason, latency_ms, delivery_id} — lets the owner verify reachability and signature validation without waiting for a real generation event. Useful right after create_webhook or rotate_webhook_secret to confirm the receiver is healthy.
Refuses API-key principals — the dispatcher already enforces externally-routable and DNS-rebinding guards, but a compromised key shouldn't be able to spam owner-initiated POSTs at attacker-controlled URLs. Cookie-authenticated humans test from the management UI or via this tool; programmatic callers go through REST with explicit risk acceptance. Mirrors REST POST /v1/api-keys/{apiKeyId}/webhooks/{webhookId}/test.
delete_webhook
Added 2026-05-12.
Removes a webhook subscription from a caller-owned API key. Takes api_key_id and webhook_id. Returns {api_key_id, webhook_id, deleted: true}. Idempotent — unknown, foreign, and already-removed webhooks surface as "not found" (the subscription is gone either way).
Allowed for both cookie and API-key callers — revocation is always safe. The worst case is an API key disabling its own webhook, which is the legitimate use case for self-managed scriptable infrastructure. Contrast create_webhook, rotate_webhook_secret, and test_webhook, which refuse API-key callers because those operations could redirect or silence event delivery. Mirrors REST DELETE /v1/api-keys/{apiKeyId}/webhooks/{webhookId}.
Webhooks instead of polling
For long-running automations or external systems where polling is awkward, register a webhook subscription on your API key and let SpecStep POST state changes to you. Subscriptions are managed through the REST API — see the step 7.5 walkthrough. The same JSON projection that comes back from get_generation / wait_for_generation is delivered in the webhook body, with HMAC-SHA256 signatures (X-SpecStep-Webhook-Signature) and a delivery id (X-SpecStep-Webhook-Delivery) for dedup. v1 is best-effort with bounded retry; the canonical state remains wait_for_generation.