Skip to main contentThe server logic is organized into modular services found in app/services/. This design pattern separates business logic from API routes, making the code testable and reusable.
Key Services
Chat Service
File: app/services/chat_service.py
This is the heart of the conversational capability. It handles:
- Session Management: Creating and retrieving chat sessions.
- Prompt Engineering: Constructing the system prompt with context.
- Token Management: Ensuring prompts stay within model limits.
- Provider Resolution: Determining which LLM model to use for a specific workspace.
Embedding Service
File: app/services/embedding_service.py
Responsible for everything related to RAG (Retrieval-Augmented Generation):
- Chunking: Splitting large documents into manageable text chunks.
- Embedding: Calling embedding models (e.g., OpenAI
text-embedding-3-small) to vectorize text.
- Indexing: Storing vectors in the configured Vector DB.
- Retrieval: Performing cosine similarity searches to find relevant context.
Agent Service
File: app/services/agent_service.py
Manages autonomous behaviors and tool use. It allows the LLM to “act” rather than just “speak” by executing defined tools (like web search or file operations).
Pluggable Layers
The server is designed to be agnostic to specific vendors for key components.
LLM Providers
Location: app/services/llm/
The server uses an adapter pattern to support multiple LLM providers.
- Factory:
app/services/llm/factory.py instantiates the correct provider based on configuration.
- Base Class: All providers inherit from a common base class, ensuring a consistent interface.
- Supported: OpenAI, Anthropic, Ollama, Google Gemini, Groq, Azure.
Vector Databases
Location: app/services/vector_db/
Similar to LLMs, vector database support is modular.
- Factory:
app/services/vector_db/factory.py.
- Supported:
- LanceDB: Embedded, serverless vector DB (default).
- ChromaDB: Open-source embedding database.
- Pinecone: Managed cloud vector database.
- Qdrant: High-performance vector search engine.
- Weaviate: AI-native vector database.
Authentication System
Location: app/core/security.py
Authentication is flexible and controlled by MULTI_USER_MODE.
- Single User: Validates a simple static
AUTH_TOKEN. Ideal for personal use.
- Multi User: Full JWT implementation.
- Login:
/api/v1/auth/login returns an access token.
- Protection: Routes are protected by the
get_current_user dependency.
- Hashing: Passwords are hashed using
bcrypt.