Execution
This document defines how a conforming implementation should execute a playbook. It covers the step execution lifecycle, context accumulation strategy, state machine, pause/resume semantics, and streaming.
Overview
Playbook execution is sequential and single-agent. Steps run one at a time, in order, each producing output that flows into subsequent steps. There is no parallel execution, no multi-agent coordination, and no conversation-style multi-turn within a step.
This design makes playbooks:
- Predictable — token costs scale linearly with step count
- Debuggable — each step can be inspected independently
- Testable — the execution core doesn’t depend on streaming or HTTP
Step Execution Lifecycle
For each step in sequence:
1. Check for @tool directive -> if present, execute tool, skip to step 72. Check for @elicit directive -> if present and no response yet, PAUSE (awaiting_input)3. Evaluate branch conditions -> select matching branch, or skip step if no match4. Resolve @prompt reference -> prepend referenced prompt content5. Render template variables -> replace {{name}} placeholders with values6. Call AI -> send prompt with system message + accumulated context7. Process @output directive -> store response as named variable if declared8. Check breakpoints -> if step is a breakpoint, PAUSE (paused)9. Record step result -> store output, tokens, latency, status10. Advance to next stepContext Accumulation
Each step receives the outputs of all previous steps. The implementation builds a context payload injected into the system message (or equivalent mechanism).
What’s Accumulated
| Source | Key | Value |
|---|---|---|
| Step output | Step number (integer) | Full AI response text |
| Named output | Variable name (string) | Captured value (full response or extracted field) |
How It’s Used
- Step outputs (by number) provide the full conversation history
- Named outputs (by variable name) provide targeted data access via
{{name}}interpolation
Both are available to downstream steps. The implementation should include accumulated context in a way that the AI can reference prior results without explicit prompting from the playbook author.
Design Rationale
Steps are not a conversation. Each step is a fresh AI call with accumulated context injected into the system prompt area. This differs from a chat model where messages build on each other in a conversation thread.
Benefits:
- Token usage per step is predictable (system prompt + accumulated context + step prompt)
- Steps can be retried independently
- The execution engine is stateless between steps (all state is in the context maps)
State Machine
A playbook execution moves through these states:
+----------+ | pending | +----+-----+ | start v +----------+ +-----| running |-----+ | +----+-----+ | | | | breakpoint complete error | | | v v v +--------+ +---------+ +--------+ | paused | |completed| | failed | +---+----+ +---------+ +--------+ | resume | v +----------+ | running | (continues from paused step) +----------+States
| State | Description |
|---|---|
pending | Queued, not yet started |
running | Actively executing steps |
completed | All steps executed successfully |
failed | An error occurred during execution |
cancelled | User cancelled the execution |
paused | Execution paused at a breakpoint |
awaiting_input | Execution paused, waiting for elicitation response |
skipped | Step was skipped (branch condition didn’t match) |
tool_call | Intermediate: step requires external tool execution |
expired | Paused execution that exceeded the timeout |
Per-Step States
Individual steps within an execution also have states:
| Step State | Meaning |
|---|---|
pending | Not yet reached |
running | Currently executing |
completed | Finished successfully |
failed | Error during this step |
skipped | Branch condition didn’t match |
paused | Breakpoint hit after this step |
awaiting_input | Waiting for elicitation response |
Pause and Resume
Execution can pause for two reasons:
Breakpoints
The caller specifies breakpoint step numbers before execution starts. After a breakpoint step completes, execution pauses with status paused.
- The full execution state (context maps, step outputs, current position) must be preserved
- Resume continues from the next step after the breakpoint
- Optional: the caller may override variable values when resuming
Elicitation
When a step has an @elicit directive and no response has been provided, execution pauses with status awaiting_input.
- The elicitation prompt and type are communicated to the caller
- The caller provides a response
- Execution resumes:
- If the step is elicit-only (no prompt content), the response becomes the step output
- If the step has prompt content, the AI call runs with the response available as context
State Persistence
Implementations should serialize the full execution state for pause/resume:
{ "execution_id": "...", "playbook_id": "...", "model_id": "...", "system_prompt": "...", "input_values": {"topic": "AI safety", "depth": "thorough"}, "step_outputs": {"1": "...", "2": "..."}, "named_outputs": {"analysis": "...", "severity": "critical"}, "breakpoints": {"3": true, "5": true}, "status": "paused", "paused_at_step": 3}Expiration
Implementations should define a timeout for paused executions. The reference implementation uses 24 hours, after which the execution transitions to expired status.
Streaming
Implementations that support real-time output should emit events during step execution:
| Event Type | Payload | When |
|---|---|---|
content | Text chunk | As AI tokens arrive |
done | Step result (tokens, latency, status) | Step completes |
error | Error message | Step fails |
The streaming protocol is not prescribed — implementations may use Server-Sent Events (SSE), WebSockets, or any other streaming mechanism. The spec defines the event types and their semantics, not the transport.
AI Provider Interface
The spec does not prescribe which AI provider or model to use. The implementation must support:
- System message — for system prompt + accumulated context
- User message — for the rendered step prompt
- Max tokens — implementations should set a reasonable default (reference: 16,384)
- Temperature — implementations may expose this as a playbook-level or step-level option
Error Handling
| Scenario | Behavior |
|---|---|
| AI call fails | Step status: failed. Execution status: failed. Error message recorded. |
| Tool call fails | Step status: failed. Execution status: failed. Error message recorded. |
| Referenced prompt not found | Step executes with its own content only (no error). |
| Unresolved variable | Placeholder remains as literal text (no error). |
| Branch variable undefined | Evaluates as empty string (no match unless compared to ""). |
Implementations should record detailed error information (message, step number, timing) for debugging.
Rate Limiting
Implementations should consider rate limiting per user or per playbook to prevent resource exhaustion. The spec does not prescribe specific limits — this is an implementation concern.
Idempotency
Playbook executions are not idempotent. The same playbook with the same inputs may produce different outputs due to AI model non-determinism. Each execution should be recorded as a distinct run with its own ID, timestamps, and results.