Prompt Chains

The execution engine that turns events into agent responses.

What Is a Chain

A prompt chain is one complete agent activation: an inbound event arrives, the chain runner constructs a prompt, calls the LLM, processes the response (including tool calls), and produces a final output. Chains are the core execution loop of Animus.

Chain Lifecycle

1. Event arrives: An inbound message from a channel (IRC, Telegram, etc.), a scheduled trigger, or a system event.
2. Session resolution: The session router maps the event to an existing session (or creates a new one). Sessions persist in SQLite and survive restarts.
3. Prompt assembly: The PromptAssembler builds the full prompt: system prompt + active memory + session history + incoming message. Reasoning effort and thinking tokens are injected based on agent config.
4. LLM call: The prompt is sent to the configured provider. Streaming responses are processed token-by-token with real-time delivery to the admin UI chat panel.
5. Tool execution: If the LLM response contains tool calls, each tool is executed with JSON I/O. Results are appended to the conversation and the LLM is called again (step loop).
6. Response: The final assistant text is sent back through the originating channel as an auto-reply, or routed to the appropriate output (file write, diary entry, etc.).

Step Loop and Budgets

The chain runner executes in a loop: LLM response → tool calls → tool results → LLM response. Each iteration is a "step." The loop continues until the LLM produces a response with no tool calls, or until the budget is exhausted.

Budgets are configured per-agent:

max_chain_steps — maximum iterations of the LLM→tool→LLM loop
max_tool_calls_per_chain — total tool calls allowed across all steps
timeout_seconds — wall-clock timeout for the entire chain
token_budget_per_prompt — maximum tokens in the assembled prompt

Reasoning

Animus uses a unified reasoning model: thinking content appears on the same SessionTurn as the assistant reply (not a separate turn). Effort levels (low, medium, high, xhigh) map to provider-native parameters. Streaming thinking deltas are delivered alongside content in the admin UI.

Implementation note: Thinking tokens are injected into the prompt as a closed block between system content and the assistant response. The exact format depends on the provider (OpenAI reasoning tokens vs. GLM thinking blocks).

Streaming

All providers support streaming. Tokens are delivered to the admin UI chat panel in real time via WebSocket. The chat panel renders thinking content and assistant text separately, with visual indicators for each.