Assistant Execution

How the AI assistant engine processes queries — from conversation context resolution through RAG retrieval, LLM generation, and response streaming.

The assistant engine is the component that powers agent conversations. It handles the full pipeline from receiving a user message to generating and streaming an AI response — including context resolution, query rephrasing, knowledge base retrieval, LLM calls, and tool use.

Execution pipeline

When an assistant step runs (typically triggered by a message in a conversation), the engine executes the following pipeline:

1. Resolve conversation context
2. Build message history
3. (Optional) Rephrase query for better retrieval
4. Retrieve documents from knowledge base
5. Construct LLM prompt with context + retrieved content
6. Call LLM to generate response
7. Stream response to conversation
8. Log execution details

1. Resolve conversation context

The engine loads the current conversation and its recent message history. This provides the AI model with the context of the ongoing discussion.

2. Build message history

Recent messages are formatted and included in the prompt. The engine applies a truncation strategy to keep the prompt within the model's context window — typically keeping the most recent messages and trimming older ones if the total exceeds the limit.

3. Rephrase query (optional)

If enabled, the engine rephrases the user's latest message into a self-contained search query. This improves retrieval quality by resolving pronouns and references from conversation context.

For example, if the conversation is:

User: "How do I set up integrations?"
Agent: "You can create integrations in the blueprint settings..."
User: "What about authentication?"

The rephrased query might be: "How does authentication work for integrations?" — which retrieves more relevant results than "What about authentication?" alone.

4. Retrieve from knowledge base

If the agent has a knowledge base configured, the engine performs a semantic search:

The query (original or rephrased) is converted to an embedding vector.
The vector database returns the most similar document chunks.
Retrieved chunks are formatted and included in the prompt as context.

5. Construct prompt

The engine assembles the full prompt from:

System instructions — the agent's description, which defines its role and behavior
Retrieved documents — relevant knowledge base content (if available)
Message history — recent conversation messages
User query — the latest message

6. Call LLM

The platform supports multiple LLM providers through a unified interface. The engine sends the constructed prompt to the configured model with parameters including:

Model — which LLM to use
Temperature — controls response randomness
Response format — text, JSON, or structured output
Tools — available actions the model can call (tool use / function calling)

7. Stream response

Responses are streamed in real time. As the LLM generates tokens, they are sent to the conversation as a streaming message. Users see the response appearing progressively rather than waiting for the full generation.

8. Log execution

The engine logs detailed execution data:

Original and rephrased queries
Retrieved documents and their relevance scores
Model used and token counts (input, output)
Response content
Timing information

These logs are useful for debugging agent behavior and improving knowledge base content.

Tool use

Agents can use tools — actions that the LLM can decide to call during response generation. When the model determines it needs to call a tool:

The model returns a tool call request instead of text.
The engine executes the requested action via CallAction.
The result is sent back to the model.
The model uses the result to continue generating its response.

This cycle can repeat multiple times (up to a configured maximum) for complex queries that require multiple tool calls.

Tool use enables agents to perform actions like querying databases, calling external APIs, or looking up specific data as part of generating a response.

Multi-provider support

The platform supports multiple LLM providers through a unified interface. Models are configured at the platform level and can be selected per assistant or per flow step. The engine handles provider-specific differences in API format, streaming protocol, and tool calling conventions.

Agents — configuring the agents that the assistant engine powers
Knowledge Base — the document retrieval layer
Flows — workflows that contain assistant steps
Actions — the tool use mechanism

Assistant Execution

On this page