Architecture#
High-Level Design#
The ai_term application is designed as a modular, event-driven Terminal User Interface (TUI) that integrates local AI capabilities. It separates concerns between the user interface, core application logic, audio processing, and data persistence.
graph TD
User[User] <--> TUI[Textual CLI App]
TUI <--> Agent[Chat Agent]
TUI <--> Audio[Audio Client]
Agent <--> LLM[LLM Provider]
Audio <--> STT[STT Service]
Audio <--> TTS[TTS Service]
TUI <--> DB[(SQLite DB)]
View Detailed Architecture Diagram
Component Details#
1. User Interface (UI) Layer#
The UI is built using Textual, a TUI framework for Python. It follows a screen-based architecture where the ChatApp manages navigation between different screens.
- ChatScreen: The main interface for user interaction, handling message display, input, and voice controls.
- SettingsScreen: Configuration management for AI models, audio devices, and themes.
- HelpScreen: Static information and key references.
graph TB
direction TB
App[ChatApp]
App -- Mounts --> Chat
App -- Switches to --> Settings
Chat -- "Ctrl+S" --> Settings
2. Core Logic & Agent#
The core logic resides in the ai_term.cli.core package. The ChatAgent is the brain of the application, orchestrating interactions between the user, the LLM (via Ollama), and external tools (via MCP).
- ChatAgent: initialized with system prompts and capable of function calling.
- MCP Manager: Discovers and connects to Model Context Protocol servers to extend agent capabilities.
- Config Manager: Handles persistent configuration (models, API keys, themes).
graph LR
direction LR
Config[Config Manager]
Agent[ChatAgent]
MCP[MCP Manager]
Config -- "Load Settings" --> Agent
Agent -- "Tool execution" --> MCP
MCP -- "JSON-RPC" --> Tools[External Tools]
Agent -- "Chat/Stream" --> Ollama[Ollama LLM]
ai_term.cli.core.agent.ChatAgent
#
Chat agent using LangChain and Ollama.
Source code in src/ai_term/cli/core/agent.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
add_tool(tool)
#
chat(messages, speech_mode=False)
async
#
Send messages to the LLM and get a response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[dict]
|
List of message dicts with 'role' and 'content'. |
required |
speech_mode
|
bool
|
Whether to optimize output for text-to-speech. |
False
|
Returns:
| Type | Description |
|---|---|
dict
|
Response dict with 'content' and optionally 'tool_calls'. |
Source code in src/ai_term/cli/core/agent.py
generate_title(user_message)
async
#
Generate a short title (3-5 words) for a chat session.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_message
|
str
|
The user's first message. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A short descriptive title. |
Source code in src/ai_term/cli/core/agent.py
stream_chat(messages)
async
#
Stream response from LLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[dict]
|
List of message dicts. |
required |
Yields:
| Name | Type | Description |
|---|---|---|
str |
AsyncGenerator[str, None]
|
Chunks of response content. |
Source code in src/ai_term/cli/core/agent.py
ai_term.cli.core.mcp_manager.MCPManager
#
Manages MCP server connections and tool discovery.
Source code in src/ai_term/cli/core/mcp_manager.py
get_server_names()
#
get_tools()
#
Get tools from all connected MCP servers.
Source code in src/ai_term/cli/core/mcp_manager.py
start_server(name)
async
#
Start an MCP server by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Server name from config. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if started successfully. |
Source code in src/ai_term/cli/core/mcp_manager.py
stop_all()
async
#
stop_server(name)
async
#
Stop an MCP server.
Source code in src/ai_term/cli/core/mcp_manager.py
3. Audio Pipeline#
The audio system is designed for real-time interaction. It uses a client-server architecture where heavy processing (Speech-to-Text and Text-to-Speech) is offloaded to local Docker containers to ensure the CLI remains responsive.
- AudioRecorder: Captures raw audio from the microphone using
sounddevice. - AudioClient: Sends audio data to the STT service and text to the TTS service via HTTP.
- AudioPlayer: Plays back the synthesized audio.
graph LR
Mic[Microphone] --> Recorder[Audio Recorder]
Recorder -- "Raw Bytes" --> App[ChatApp]
App -- "Transcribe" --> Client[Audio Client]
Client -- "POST /transcribe" --> STT[STT Service]
STT -- "Text" --> Client
App -- "Speak (Text)" --> Client
Client -- "POST /generate" --> TTS[TTS Service]
TTS -- "Audio Bytes" --> Client
Client --> Player[Audio Player]
Player --> Speaker[Speakers]
ai_term.cli.core.audio_client.AudioClient
#
Client for interacting with STT and TTS servers.
Source code in src/ai_term/cli/core/audio_client.py
speak(ai_response, user_query=None)
async
#
Send text to TTS server and get audio bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ai_response
|
str
|
AI response text to synthesize. |
required |
user_query
|
Optional[str]
|
User query text to synthesize (optional). |
None
|
Returns:
| Type | Description |
|---|---|
bytes
|
Audio bytes (WAV or MP3 format depending on provider). |
Source code in src/ai_term/cli/core/audio_client.py
transcribe(audio_bytes)
async
#
Send audio to STT server and get transcription.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_bytes
|
bytes
|
Raw audio data (WAV format). |
required |
Returns:
| Type | Description |
|---|---|
str
|
Transcribed text. |
Source code in src/ai_term/cli/core/audio_client.py
4. Data Persistence#
Application state, chat history, and sessions are persisted in a local SQLite database using SQLAlchemy (asyncio).
- Session: Represents a conversation thread.
- Message: Individual user or assistant messages, including tool calls.
graph LR
App[ChatApp]
Engine[DB Engine]
DB[(SQLite File)]
App -- "Save/Load" --> Engine
Engine -- "SQLAlchemy (Async)" --> DB