Assistant Execution
How the AI assistant engine processes queries — from conversation context resolution through RAG retrieval, LLM generation, and response streaming.
The assistant engine is the component that powers agent conversations. It handles the full pipeline from receiving a user message to generating and streaming an AI response — including context resolution, query rephrasing, knowledge base retrieval, LLM calls, and tool use.
Execution pipeline
When an assistant step runs (typically triggered by a message in a conversation), the engine executes the following pipeline:
1. Resolve conversation context
2. Build message history
3. (Optional) Rephrase query for better retrieval
4. Retrieve documents from knowledge base
5. Construct LLM prompt with context + retrieved content
6. Call LLM to generate response
7. Stream response to conversation
8. Log execution details1. Resolve conversation context
The engine loads the current conversation and its recent message history. This provides the AI model with the context of the ongoing discussion.
2. Build message history
Recent messages are formatted and included in the prompt. The engine applies a truncation strategy to keep the prompt within the model's context window — typically keeping the most recent messages and trimming older ones if the total exceeds the limit.
3. Rephrase query (optional)
If enabled, the engine rephrases the user's latest message into a self-contained search query. This improves retrieval quality by resolving pronouns and references from conversation context.
For example, if the conversation is:
- User: "How do I set up integrations?"
- Agent: "You can create integrations in the blueprint settings..."
- User: "What about authentication?"
The rephrased query might be: "How does authentication work for integrations?" — which retrieves more relevant results than "What about authentication?" alone.
4. Retrieve from knowledge base
If the agent has a knowledge base configured, the engine performs a semantic search:
- The query (original or rephrased) is converted to an embedding vector.
- The vector database returns the most similar document chunks.
- Retrieved chunks are formatted and included in the prompt as context.
5. Construct prompt
The engine assembles the full prompt from:
- System instructions — the agent's description, which defines its role and behavior
- Retrieved documents — relevant knowledge base content (if available)
- Message history — recent conversation messages
- User query — the latest message
6. Call LLM
The platform supports multiple LLM providers through a unified interface. The engine sends the constructed prompt to the configured model with parameters including:
- Model — which LLM to use
- Temperature — controls response randomness
- Response format — text, JSON, or structured output
- Tools — available actions the model can call (tool use / function calling)
7. Stream response
Responses are streamed in real time. As the LLM generates tokens, they are sent to the conversation as a streaming message. Users see the response appearing progressively rather than waiting for the full generation.
8. Log execution
The engine logs detailed execution data:
- Original and rephrased queries
- Retrieved documents and their relevance scores
- Model used and token counts (input, output)
- Response content
- Timing information
These logs are useful for debugging agent behavior and improving knowledge base content.
Tool use
Agents can use tools — actions that the LLM can decide to call during response generation. When the model determines it needs to call a tool:
- The model returns a tool call request instead of text.
- The engine executes the requested action via
CallAction. - The result is sent back to the model.
- The model uses the result to continue generating its response.
This cycle can repeat multiple times (up to a configured maximum) for complex queries that require multiple tool calls.
Tool use enables agents to perform actions like querying databases, calling external APIs, or looking up specific data as part of generating a response.
Multi-provider support
The platform supports multiple LLM providers through a unified interface. Models are configured at the platform level and can be selected per assistant or per flow step. The engine handles provider-specific differences in API format, streaming protocol, and tool calling conventions.
Related concepts
- Agents — configuring the agents that the assistant engine powers
- Knowledge Base — the document retrieval layer
- Flows — workflows that contain assistant steps
- Actions — the tool use mechanism
Knowledge Base
Build knowledge bases for agents using vector search. Upload documents, generate embeddings, and enable retrieval-augmented generation (RAG) for accurate, context-aware responses.
Flows
Flows are visual automation workflows composed of steps organized in groups. They define the multi-step logic for agents, integrations, and business processes.