Skip to content

Architecture#

High-Level Design#

The ai_term application is designed as a modular, event-driven Terminal User Interface (TUI) that integrates local AI capabilities. It separates concerns between the user interface, core application logic, audio processing, and data persistence.

graph TD
    User[User] <--> TUI[Textual CLI App]
    TUI <--> Agent[Chat Agent]
    TUI <--> Audio[Audio Client]

    Agent <--> LLM[LLM Provider]
    Audio <--> STT[STT Service]
    Audio <--> TTS[TTS Service]

    TUI <--> DB[(SQLite DB)]

View Detailed Architecture Diagram


Component Details#

1. User Interface (UI) Layer#

The UI is built using Textual, a TUI framework for Python. It follows a screen-based architecture where the ChatApp manages navigation between different screens.

  • ChatScreen: The main interface for user interaction, handling message display, input, and voice controls.
  • SettingsScreen: Configuration management for AI models, audio devices, and themes.
  • HelpScreen: Static information and key references.
graph TB
    direction TB
    App[ChatApp]

    App -- Mounts --> Chat
    App -- Switches to --> Settings

    Chat -- "Ctrl+S" --> Settings

2. Core Logic & Agent#

The core logic resides in the ai_term.cli.core package. The ChatAgent is the brain of the application, orchestrating interactions between the user, the LLM (via Ollama), and external tools (via MCP).

  • ChatAgent: initialized with system prompts and capable of function calling.
  • MCP Manager: Discovers and connects to Model Context Protocol servers to extend agent capabilities.
  • Config Manager: Handles persistent configuration (models, API keys, themes).
graph LR
    direction LR
    Config[Config Manager]
    Agent[ChatAgent]
    MCP[MCP Manager]

    Config -- "Load Settings" --> Agent
    Agent -- "Tool execution" --> MCP
    MCP -- "JSON-RPC" --> Tools[External Tools]
    Agent -- "Chat/Stream" --> Ollama[Ollama LLM]

ai_term.cli.core.agent.ChatAgent #

Chat agent using LangChain and Ollama.

Source code in src/ai_term/cli/core/agent.py
class ChatAgent:
    """Chat agent using LangChain and Ollama."""

    SPEECH_SYSTEM_PROMPT = (
        "You are a helpful AI assistant. Your responses will be read aloud by a "
        "text-to-speech system.\n\n"
        "CRITICAL OUTPUT RULES:\n"
        "- NO markdown: Avoid **, *, `, #, >, -, |, or any formatting symbols\n"
        "- NO emojis or special unicode characters\n"
        "- NO bullet points or numbered lists with symbols; use natural prose instead\n"
        "- NO code blocks; describe code concepts verbally\n"
        "- NO tables; present tabular data as flowing sentences\n\n"
        "SPEECH OPTIMIZATION:\n"
        "- Use short, clear sentences that are easy to follow when heard\n"
        "- Add natural pauses with commas and periods\n"
        "IMPORTANT: Previous messages in this conversation may contain markdown or "
        "formatting. Ignore that formatting in your response. Your output must be "
        "plain, speakable text only."
    )

    def __init__(self, system_prompt: str = "You are a helpful AI assistant."):
        config = get_app_config()
        self.llm = ChatOllama(
            model=config.llm.model,
            base_url=config.llm.base_url,
            temperature=0.7,
        )
        self.system_prompt = system_prompt
        self.tools = []

    def add_tool(self, tool):
        """Add a tool to the agent."""
        self.tools.append(tool)
        self.llm = self.llm.bind_tools(self.tools)

    async def chat(self, messages: list[dict], speech_mode: bool = False) -> dict:
        """
        Send messages to the LLM and get a response.

        Args:
            messages: List of message dicts with 'role' and 'content'.
            speech_mode: Whether to optimize output for text-to-speech.

        Returns:
            Response dict with 'content' and optionally 'tool_calls'.
        """
        # Convert to LangChain message format
        system_prompt = self.SPEECH_SYSTEM_PROMPT if speech_mode else self.system_prompt
        lc_messages = [SystemMessage(content=system_prompt)]

        for msg in messages:
            role = msg.get("role", "user")
            content = msg.get("content", "")

            if role == "user":
                lc_messages.append(HumanMessage(content=content))
            elif role == "assistant":
                lc_messages.append(AIMessage(content=content))
            elif role == "system":
                lc_messages.append(SystemMessage(content=content))

        # Get response from LLM
        response = await self.llm.ainvoke(lc_messages)

        result = {"content": response.content, "tool_calls": None}

        # Check for tool calls
        if hasattr(response, "tool_calls") and response.tool_calls:
            result["tool_calls"] = [
                {"name": tc["name"], "args": tc["args"], "id": tc.get("id")}
                for tc in response.tool_calls
            ]

        return result

    async def stream_chat(self, messages: list[dict]) -> AsyncGenerator[str, None]:
        """
        Stream response from LLM.

        Args:
            messages: List of message dicts.

        Yields:
            str: Chunks of response content.
        """
        lc_messages = [SystemMessage(content=self.system_prompt)]

        for msg in messages:
            role = msg.get("role", "user")
            content = msg.get("content", "")

            if role == "user":
                lc_messages.append(HumanMessage(content=content))
            elif role == "assistant":
                lc_messages.append(AIMessage(content=content))

        async for chunk in self.llm.astream(lc_messages):
            if chunk.content:
                yield str(chunk.content)

    async def generate_title(self, user_message: str) -> str:
        """
        Generate a short title (3-5 words) for a chat session.

        Args:
            user_message: The user's first message.

        Returns:
            A short descriptive title.
        """
        system_prompt = (
            "Generate a very short title (3-5 words max) for a chat session "
            "based on the user's message. Output ONLY the title, nothing else. "
            "No quotes, no punctuation at the end, no explanation."
        )

        lc_messages = [
            SystemMessage(content=system_prompt),
            HumanMessage(content=user_message),
        ]

        try:
            response = await self.llm.ainvoke(lc_messages)
            title = str(response.content).strip()
            # Ensure it's not too long
            words = title.split()
            if len(words) > 6:
                title = " ".join(words[:5])
            return title or "New Chat"
        except Exception:
            return "New Chat"

add_tool(tool) #

Add a tool to the agent.

Source code in src/ai_term/cli/core/agent.py
def add_tool(self, tool):
    """Add a tool to the agent."""
    self.tools.append(tool)
    self.llm = self.llm.bind_tools(self.tools)

chat(messages, speech_mode=False) async #

Send messages to the LLM and get a response.

Parameters:

Name Type Description Default
messages list[dict]

List of message dicts with 'role' and 'content'.

required
speech_mode bool

Whether to optimize output for text-to-speech.

False

Returns:

Type Description
dict

Response dict with 'content' and optionally 'tool_calls'.

Source code in src/ai_term/cli/core/agent.py
async def chat(self, messages: list[dict], speech_mode: bool = False) -> dict:
    """
    Send messages to the LLM and get a response.

    Args:
        messages: List of message dicts with 'role' and 'content'.
        speech_mode: Whether to optimize output for text-to-speech.

    Returns:
        Response dict with 'content' and optionally 'tool_calls'.
    """
    # Convert to LangChain message format
    system_prompt = self.SPEECH_SYSTEM_PROMPT if speech_mode else self.system_prompt
    lc_messages = [SystemMessage(content=system_prompt)]

    for msg in messages:
        role = msg.get("role", "user")
        content = msg.get("content", "")

        if role == "user":
            lc_messages.append(HumanMessage(content=content))
        elif role == "assistant":
            lc_messages.append(AIMessage(content=content))
        elif role == "system":
            lc_messages.append(SystemMessage(content=content))

    # Get response from LLM
    response = await self.llm.ainvoke(lc_messages)

    result = {"content": response.content, "tool_calls": None}

    # Check for tool calls
    if hasattr(response, "tool_calls") and response.tool_calls:
        result["tool_calls"] = [
            {"name": tc["name"], "args": tc["args"], "id": tc.get("id")}
            for tc in response.tool_calls
        ]

    return result

generate_title(user_message) async #

Generate a short title (3-5 words) for a chat session.

Parameters:

Name Type Description Default
user_message str

The user's first message.

required

Returns:

Type Description
str

A short descriptive title.

Source code in src/ai_term/cli/core/agent.py
async def generate_title(self, user_message: str) -> str:
    """
    Generate a short title (3-5 words) for a chat session.

    Args:
        user_message: The user's first message.

    Returns:
        A short descriptive title.
    """
    system_prompt = (
        "Generate a very short title (3-5 words max) for a chat session "
        "based on the user's message. Output ONLY the title, nothing else. "
        "No quotes, no punctuation at the end, no explanation."
    )

    lc_messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=user_message),
    ]

    try:
        response = await self.llm.ainvoke(lc_messages)
        title = str(response.content).strip()
        # Ensure it's not too long
        words = title.split()
        if len(words) > 6:
            title = " ".join(words[:5])
        return title or "New Chat"
    except Exception:
        return "New Chat"

stream_chat(messages) async #

Stream response from LLM.

Parameters:

Name Type Description Default
messages list[dict]

List of message dicts.

required

Yields:

Name Type Description
str AsyncGenerator[str, None]

Chunks of response content.

Source code in src/ai_term/cli/core/agent.py
async def stream_chat(self, messages: list[dict]) -> AsyncGenerator[str, None]:
    """
    Stream response from LLM.

    Args:
        messages: List of message dicts.

    Yields:
        str: Chunks of response content.
    """
    lc_messages = [SystemMessage(content=self.system_prompt)]

    for msg in messages:
        role = msg.get("role", "user")
        content = msg.get("content", "")

        if role == "user":
            lc_messages.append(HumanMessage(content=content))
        elif role == "assistant":
            lc_messages.append(AIMessage(content=content))

    async for chunk in self.llm.astream(lc_messages):
        if chunk.content:
            yield str(chunk.content)

ai_term.cli.core.mcp_manager.MCPManager #

Manages MCP server connections and tool discovery.

Source code in src/ai_term/cli/core/mcp_manager.py
class MCPManager:
    """Manages MCP server connections and tool discovery."""

    def __init__(self):
        self.config = get_mcp_config()
        self.processes: dict[str, subprocess.Popen] = {}
        self.tools: list[Any] = []

    def get_server_names(self) -> list[str]:
        """Get list of configured MCP server names."""
        return list(self.config.mcpServers.keys())

    async def start_server(self, name: str) -> bool:
        """
        Start an MCP server by name.

        Args:
            name: Server name from config.

        Returns:
            True if started successfully.
        """
        if name not in self.config.mcpServers:
            return False

        if name in self.processes and self.processes[name].poll() is None:
            return True  # Already running

        server_config = self.config.mcpServers[name]

        env = dict(server_config.env)

        try:
            process = subprocess.Popen(
                [server_config.command] + server_config.args,
                env=env,
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
            )
            self.processes[name] = process
            return True
        except Exception as e:
            print(f"Failed to start MCP server {name}: {e}")
            return False

    async def stop_server(self, name: str):
        """Stop an MCP server."""
        if name in self.processes:
            proc = self.processes[name]
            if proc.poll() is None:
                proc.terminate()
                try:
                    proc.wait(timeout=5)
                except subprocess.TimeoutExpired:
                    proc.kill()
            del self.processes[name]

    async def stop_all(self):
        """Stop all running MCP servers."""
        for name in list(self.processes.keys()):
            await self.stop_server(name)

    def get_tools(self) -> list[Any]:
        """Get tools from all connected MCP servers."""
        # TODO: Implement MCP protocol communication to discover tools
        # For now, return empty list. Will be populated when servers are connected.
        return self.tools

get_server_names() #

Get list of configured MCP server names.

Source code in src/ai_term/cli/core/mcp_manager.py
def get_server_names(self) -> list[str]:
    """Get list of configured MCP server names."""
    return list(self.config.mcpServers.keys())

get_tools() #

Get tools from all connected MCP servers.

Source code in src/ai_term/cli/core/mcp_manager.py
def get_tools(self) -> list[Any]:
    """Get tools from all connected MCP servers."""
    # TODO: Implement MCP protocol communication to discover tools
    # For now, return empty list. Will be populated when servers are connected.
    return self.tools

start_server(name) async #

Start an MCP server by name.

Parameters:

Name Type Description Default
name str

Server name from config.

required

Returns:

Type Description
bool

True if started successfully.

Source code in src/ai_term/cli/core/mcp_manager.py
async def start_server(self, name: str) -> bool:
    """
    Start an MCP server by name.

    Args:
        name: Server name from config.

    Returns:
        True if started successfully.
    """
    if name not in self.config.mcpServers:
        return False

    if name in self.processes and self.processes[name].poll() is None:
        return True  # Already running

    server_config = self.config.mcpServers[name]

    env = dict(server_config.env)

    try:
        process = subprocess.Popen(
            [server_config.command] + server_config.args,
            env=env,
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )
        self.processes[name] = process
        return True
    except Exception as e:
        print(f"Failed to start MCP server {name}: {e}")
        return False

stop_all() async #

Stop all running MCP servers.

Source code in src/ai_term/cli/core/mcp_manager.py
async def stop_all(self):
    """Stop all running MCP servers."""
    for name in list(self.processes.keys()):
        await self.stop_server(name)

stop_server(name) async #

Stop an MCP server.

Source code in src/ai_term/cli/core/mcp_manager.py
async def stop_server(self, name: str):
    """Stop an MCP server."""
    if name in self.processes:
        proc = self.processes[name]
        if proc.poll() is None:
            proc.terminate()
            try:
                proc.wait(timeout=5)
            except subprocess.TimeoutExpired:
                proc.kill()
        del self.processes[name]

3. Audio Pipeline#

The audio system is designed for real-time interaction. It uses a client-server architecture where heavy processing (Speech-to-Text and Text-to-Speech) is offloaded to local Docker containers to ensure the CLI remains responsive.

  • AudioRecorder: Captures raw audio from the microphone using sounddevice.
  • AudioClient: Sends audio data to the STT service and text to the TTS service via HTTP.
  • AudioPlayer: Plays back the synthesized audio.
graph LR
    Mic[Microphone] --> Recorder[Audio Recorder]
    Recorder -- "Raw Bytes" --> App[ChatApp]
    App -- "Transcribe" --> Client[Audio Client]
    Client -- "POST /transcribe" --> STT[STT Service]
    STT -- "Text" --> Client

    App -- "Speak (Text)" --> Client
    Client -- "POST /generate" --> TTS[TTS Service]
    TTS -- "Audio Bytes" --> Client
    Client --> Player[Audio Player]
    Player --> Speaker[Speakers]

ai_term.cli.core.audio_client.AudioClient #

Client for interacting with STT and TTS servers.

Source code in src/ai_term/cli/core/audio_client.py
class AudioClient:
    """Client for interacting with STT and TTS servers."""

    def __init__(self):
        config = get_app_config()
        self.stt_url = config.audio.stt_url
        self.tts_url = config.audio.tts_url

    async def transcribe(self, audio_bytes: bytes) -> str:
        """
        Send audio to STT server and get transcription.

        Args:
            audio_bytes: Raw audio data (WAV format).

        Returns:
            Transcribed text.
        """
        async with httpx.AsyncClient() as client:
            files = {"file": ("audio.wav", audio_bytes, "audio/wav")}
            response = await client.post(
                f"{self.stt_url}/transcribe", files=files, timeout=30.0
            )
            response.raise_for_status()
            return response.json().get("text", "")

    async def speak(self, ai_response: str, user_query: Optional[str] = None) -> bytes:
        """
        Send text to TTS server and get audio bytes.

        Args:
            ai_response: AI response text to synthesize.
            user_query: User query text to synthesize (optional).

        Returns:
            Audio bytes (WAV or MP3 format depending on provider).
        """
        config = get_app_config()
        tts_config = config.audio.tts

        # Build request payload
        payload: dict = {"text": ai_response, "previous_text": user_query}

        if tts_config:
            # Resolve the api_key from env var name to actual value
            from ai_term.cli.config import resolve_secret

            payload["provider_config"] = {
                "provider": tts_config.provider,
                "api_key": resolve_secret(tts_config.api_key),
                "voice_id": tts_config.voice_id,
                "model_id": tts_config.model_id,
            }

        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.tts_url}/generate", json=payload, timeout=60.0
            )
            response.raise_for_status()
            return response.content

speak(ai_response, user_query=None) async #

Send text to TTS server and get audio bytes.

Parameters:

Name Type Description Default
ai_response str

AI response text to synthesize.

required
user_query Optional[str]

User query text to synthesize (optional).

None

Returns:

Type Description
bytes

Audio bytes (WAV or MP3 format depending on provider).

Source code in src/ai_term/cli/core/audio_client.py
async def speak(self, ai_response: str, user_query: Optional[str] = None) -> bytes:
    """
    Send text to TTS server and get audio bytes.

    Args:
        ai_response: AI response text to synthesize.
        user_query: User query text to synthesize (optional).

    Returns:
        Audio bytes (WAV or MP3 format depending on provider).
    """
    config = get_app_config()
    tts_config = config.audio.tts

    # Build request payload
    payload: dict = {"text": ai_response, "previous_text": user_query}

    if tts_config:
        # Resolve the api_key from env var name to actual value
        from ai_term.cli.config import resolve_secret

        payload["provider_config"] = {
            "provider": tts_config.provider,
            "api_key": resolve_secret(tts_config.api_key),
            "voice_id": tts_config.voice_id,
            "model_id": tts_config.model_id,
        }

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{self.tts_url}/generate", json=payload, timeout=60.0
        )
        response.raise_for_status()
        return response.content

transcribe(audio_bytes) async #

Send audio to STT server and get transcription.

Parameters:

Name Type Description Default
audio_bytes bytes

Raw audio data (WAV format).

required

Returns:

Type Description
str

Transcribed text.

Source code in src/ai_term/cli/core/audio_client.py
async def transcribe(self, audio_bytes: bytes) -> str:
    """
    Send audio to STT server and get transcription.

    Args:
        audio_bytes: Raw audio data (WAV format).

    Returns:
        Transcribed text.
    """
    async with httpx.AsyncClient() as client:
        files = {"file": ("audio.wav", audio_bytes, "audio/wav")}
        response = await client.post(
            f"{self.stt_url}/transcribe", files=files, timeout=30.0
        )
        response.raise_for_status()
        return response.json().get("text", "")

4. Data Persistence#

Application state, chat history, and sessions are persisted in a local SQLite database using SQLAlchemy (asyncio).

  • Session: Represents a conversation thread.
  • Message: Individual user or assistant messages, including tool calls.
graph LR
    App[ChatApp]
    Engine[DB Engine]
    DB[(SQLite File)]

    App -- "Save/Load" --> Engine
    Engine -- "SQLAlchemy (Async)" --> DB