# Architecture Guide This document is for developers who want to understand, modify, or extend nano-claude-code. For user-facing docs, see [README.md](../README.md). --- ## Overview Nano-claude-code is a ~3.4K-line Python CLI that lets LLMs (GPT, Gemini, etc.) operate as coding agents with tool use, memory, sub-agents, and skills. The architecture is a flat module layout designed for readability and future migration to a package structure. ``` User Input │ ▼ nano_claude.py ── REPL, slash commands, rendering │ ├──► agent.py ── multi-turn loop, permission gates │ │ │ ├──► providers.py ── API streaming (Anthropic / OpenAI-compat) │ ├──► tool_registry.py ──► tools.py ── 13 tools │ ├──► compaction.py ── context window management │ └──► subagent.py ── threaded sub-agent lifecycle │ ├──► context.py ── system prompt (git, CLAUDE.md, memory) │ └──► memory.py ── persistent file-based memory │ ├──► skills.py ── markdown skill loading + execution └──► config.py ── configuration persistence ``` **Key invariant:** Dependencies flow downward. No circular imports at the module level (subagent.py uses lazy imports to call agent.py). --- ## Module Reference ### `tool_registry.py` — Tool Plugin System The central registry that all tools register into. This is the foundation for extensibility. **Data model:** ```python @dataclass class ToolDef: name: str # unique identifier (e.g. "Read", "MemorySave") schema: dict # JSON schema sent to the LLM API func: Callable # (params: dict, config: dict) -> str read_only: bool # True = auto-approve in 'auto' permission mode concurrent_safe: bool # True = safe to run in parallel (for sub-agents) ``` **Public API:** | Function | Description | |---|---| | `register_tool(tool_def)` | Add a tool to the registry (overwrites by name) | | `get_tool(name)` | Look up by name, returns `None` if not found | | `get_all_tools()` | List all registered tools | | `get_tool_schemas()` | Return schemas for API calls | | `execute_tool(name, params, config, max_output=32000)` | Execute with output truncation | | `clear_registry()` | Reset — for testing only | **Output truncation:** If a tool returns more than `max_output` chars, the result is truncated to `first_half + [... N chars truncated ...] + last_quarter`. This prevents a single tool call (e.g. reading a huge file) from blowing up the context window. **Registering a custom tool:** ```python from tool_registry import ToolDef, register_tool def my_tool(params, config): return f"Hello, {params['name']}!" register_tool(ToolDef( name="MyTool", schema={ "name": "MyTool", "description": "A greeting tool", "input_schema": { "type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"], }, }, func=my_tool, read_only=True, concurrent_safe=True, )) ``` ### `tools.py` — Built-in Tool Implementations Contains the 8 core tools (Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch) plus memory tools (MemorySave, MemoryDelete) and sub-agent tools (Agent, CheckAgentResult, ListAgentTasks). All register themselves via `tool_registry` at import time. **Key internals:** - `_is_safe_bash(cmd)` — whitelist of safe shell commands for auto-approval - `generate_unified_diff(old, new, filename)` — diff generation for Edit/Write - `maybe_truncate_diff(diff_text, max_lines=80)` — truncate large diffs for display - `_get_agent_manager()` — lazy singleton for SubAgentManager - Backward-compatible `execute_tool(name, inputs, permission_mode, ask_permission)` wrapper ### `agent.py` — Core Agent Loop The heart of the system. `run()` is a generator that yields events as they happen. ```python def run(user_message, state, config, system_prompt, depth=0, cancel_check=None) -> Generator: ``` **Loop logic:** ``` 1. Append user message 2. Inject depth into config (for sub-agent depth tracking) 3. While True: a. Check cancel_check() — cooperative cancellation for sub-agents b. maybe_compact(state, config) — compress if near context limit c. Stream from provider → yield TextChunk / ThinkingChunk d. Record assistant message e. If no tool_calls → break f. For each tool_call: - Permission check (_check_permission) - If denied → yield PermissionRequest → user decides - Execute tool → yield ToolStart / ToolEnd - Append tool result g. Loop (model sees tool results and responds) ``` **Event types:** | Event | Fields | When | |---|---|---| | `TextChunk` | `text` | Streaming text delta | | `ThinkingChunk` | `text` | Extended thinking block | | `ToolStart` | `name, inputs` | Before tool execution | | `ToolEnd` | `name, result, permitted` | After tool execution | | `PermissionRequest` | `description, granted` | Needs user approval | | `TurnDone` | `input_tokens, output_tokens` | End of one API turn | ### `compaction.py` — Context Window Management Keeps conversations within model context limits using two layers. **Layer 1: Snip** (`snip_old_tool_results`) - Rule-based, no API cost - Truncates tool-role messages older than `preserve_last_n_turns` (default 6) - Keeps first half + last quarter of the content **Layer 2: Auto-Compact** (`compact_messages`) - Model-driven: calls the current model to summarize old messages - Splits messages into [old | recent] at ~70/30 ratio - Replaces old messages with a summary + acknowledgment **Trigger:** `maybe_compact()` checks `estimate_tokens(messages) > context_limit * 0.7`. Runs snip first (cheap), then auto-compact if still over. **Token estimation:** `len(content) / 3.5` — simple heuristic. Works for most models. `get_context_limit(model)` reads from the provider registry. ### `memory.py` — Persistent Memory File-based memory system stored in `~/.nano_claude/memory/`. **Storage format:** ``` ~/.nano_claude/memory/ ├── MEMORY.md # Index: one line per memory ├── user_preferences.md # Individual memory file └── project_auth.md ``` Each memory file uses markdown with YAML frontmatter: ```markdown --- name: user preferences description: coding style preferences type: feedback created: 2026-04-02 --- User prefers 4-space indentation and type hints. ``` **How it integrates:** - `get_memory_context()` returns the MEMORY.md index text - `context.py` injects this into the system prompt - The model reads the index, then uses `Read` tool to access full memory content - The model uses `MemorySave` / `MemoryDelete` tools to manage memories ### `subagent.py` — Threaded Sub-Agents Sub-agents run in background threads via `ThreadPoolExecutor`. **Key design decisions:** 1. **Fresh context** — each sub-agent starts with empty message history + task prompt 2. **Depth limiting** — `max_depth=3`, checked at spawn time. Model gets an error message (not silent tool removal) so it can adapt. 3. **Cooperative cancellation** — `cancel_check` callable checked each loop iteration. Python threads can't be killed safely, so we set a flag. 4. **Threading, not asyncio** — the entire codebase is synchronous generators. Threading via `concurrent.futures` keeps things simple. The SubAgentManager API is designed to be compatible with a future async migration. **Lifecycle:** ``` spawn(prompt, config, system_prompt, depth) → Creates SubAgentTask → Submits _run to ThreadPoolExecutor → _run calls agent.run() with depth+1 wait(task_id, timeout) → blocks until complete cancel(task_id) → sets _cancel_flag get_result(task_id) → returns result string ``` ### `skills.py` — Reusable Prompt Templates Skills are markdown files with frontmatter. They are **not code** — just structured prompts that get injected into the agent loop. **Skill file format:** ```markdown --- name: commit description: Create a conventional commit triggers: ["/commit"] tools: [Bash, Read] --- Your prompt instructions here... ``` **Execution:** `execute_skill()` wraps the skill prompt as a user message and calls `agent.run()`. The skill runs through the exact same agent loop as a normal query. **Search order:** Project-level (`./.nano_claude/skills/`) overrides user-level (`~/.nano_claude/skills/`) when skill names collide. ### `providers.py` — Multi-Provider Abstraction Two streaming adapters cover all providers: | Adapter | Providers | |---|---| | `stream_anthropic()` | Anthropic (native SDK) | | `stream_openai_compat()` | OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, Ollama, LM Studio, Custom | **Neutral message format** (provider-independent): ```python {"role": "user", "content": "..."} {"role": "assistant", "content": "...", "tool_calls": [{"id": "...", "name": "...", "input": {...}}]} {"role": "tool", "tool_call_id": "...", "name": "...", "content": "..."} ``` Conversion functions: `messages_to_anthropic()`, `messages_to_openai()`, `tools_to_openai()`. **Provider-specific handling:** - Gemini 3 models require `thought_signature` in tool call responses — this is transparently captured and passed through via `extra_content` on tool_call dicts. ### `context.py` — System Prompt Builder Assembles the system prompt from: 1. Base template (role, date, cwd, platform) 2. Git info (branch, status, recent commits) 3. CLAUDE.md content (project-level + global) 4. Memory index (from `memory.get_memory_context()`) ### `config.py` — Configuration Defaults stored in `~/.nano_claude/config.json`. Key settings: | Key | Default | Description | |---|---|---| | `model` | `claude-opus-4-6` | Active model | | `max_tokens` | `8192` | Max output tokens | | `permission_mode` | `auto` | Permission mode | | `max_tool_output` | `32000` | Tool output truncation limit | | `max_agent_depth` | `3` | Max sub-agent nesting | | `max_concurrent_agents` | `3` | Thread pool size | --- ## Data Flow Example A user asks "Read config.py and change max_tokens to 16384": ``` 1. nano_claude.py captures input 2. agent.run() appends user message, calls maybe_compact() 3. providers.stream() sends to Gemini API with 13 tool schemas 4. Model responds: text + tool_call[Read(config.py)] 5. agent.py checks permission (Read = read_only → auto-approve) 6. tool_registry.execute_tool("Read", ...) → file content (truncated if >32K) 7. Tool result appended to messages, loop back to step 3 8. Model responds: text + tool_call[Edit(config.py, "8192", "16384")] 9. agent.py checks permission (Edit = not read_only → ask user) 10. User approves → tools.py._edit() runs, generates diff 11. nano_claude.py renders diff with ANSI colors (red/green) 12. Tool result appended, loop back to step 3 13. Model responds: "Done, max_tokens changed to 16384" 14. No tool_calls → loop ends, TurnDone yielded ``` --- ## Testing ```bash # Run all 78 tests python -m pytest tests/ -v # Run specific module tests python -m pytest tests/test_tool_registry.py -v python -m pytest tests/test_compaction.py -v python -m pytest tests/test_memory.py -v python -m pytest tests/test_subagent.py -v python -m pytest tests/test_skills.py -v python -m pytest tests/test_diff_view.py -v ``` Tests use `monkeypatch` and `tmp_path` fixtures to avoid side effects. Sub-agent tests mock `_agent_run` to avoid real API calls. --- ## Future: Package Refactoring When `tools.py` or `agent.py` grow too large, the flat layout can be migrated to: ``` ncc/ ├── __init__.py ├── repl.py # from nano_claude.py ├── agent/ │ ├── loop.py # from agent.py │ ├── subagent.py # from subagent.py │ └── compaction.py # from compaction.py ├── providers/ │ ├── base.py │ ├── openai_compat.py │ └── registry.py ├── tools/ │ ├── registry.py # from tool_registry.py │ ├── builtin.py # core 8 tools from tools.py │ ├── memory.py # MemorySave/MemoryDelete from tools.py │ └── subagent.py # Agent/Check/List from tools.py ├── memory/ │ └── store.py # from memory.py ├── skills/ │ └── loader.py # from skills.py └── config.py ``` The current code is structured to make this migration straightforward: - Modules communicate via function parameters, not globals - Each module has a small public API surface - Dependencies are unidirectional