- README.MD: add original-source-code and nano-claude-code sections, update overview table (4 subprojects), add v3.0 news entry, expand comparison table with memory/multi-agent/skills dimensions - nano-claude-code v3.0: multi-agent package (multi_agent/), memory package (memory/), skill package (skill/) with built-in /commit and /review skills, context compression (compaction.py), tool registry plugin system, diff view, 17 slash commands, 18 built-in tools, 101 tests (~5000 lines total) - original-source-code/src: add raw TypeScript source tree (1884 files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
375 lines
12 KiB
Markdown
375 lines
12 KiB
Markdown
# Architecture Guide
|
|
|
|
This document is for developers who want to understand, modify, or extend nano-claude-code.
|
|
For user-facing docs, see [README.md](../README.md).
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Nano-claude-code is a ~3.4K-line Python CLI that lets LLMs (GPT, Gemini, etc.) operate as
|
|
coding agents with tool use, memory, sub-agents, and skills. The architecture is a flat
|
|
module layout designed for readability and future migration to a package structure.
|
|
|
|
```
|
|
User Input
|
|
│
|
|
▼
|
|
nano_claude.py ── REPL, slash commands, rendering
|
|
│
|
|
├──► agent.py ── multi-turn loop, permission gates
|
|
│ │
|
|
│ ├──► providers.py ── API streaming (Anthropic / OpenAI-compat)
|
|
│ ├──► tool_registry.py ──► tools.py ── 13 tools
|
|
│ ├──► compaction.py ── context window management
|
|
│ └──► subagent.py ── threaded sub-agent lifecycle
|
|
│
|
|
├──► context.py ── system prompt (git, CLAUDE.md, memory)
|
|
│ └──► memory.py ── persistent file-based memory
|
|
│
|
|
├──► skills.py ── markdown skill loading + execution
|
|
└──► config.py ── configuration persistence
|
|
```
|
|
|
|
**Key invariant:** Dependencies flow downward. No circular imports at the module level
|
|
(subagent.py uses lazy imports to call agent.py).
|
|
|
|
---
|
|
|
|
## Module Reference
|
|
|
|
### `tool_registry.py` — Tool Plugin System
|
|
|
|
The central registry that all tools register into. This is the foundation for extensibility.
|
|
|
|
**Data model:**
|
|
|
|
```python
|
|
@dataclass
|
|
class ToolDef:
|
|
name: str # unique identifier (e.g. "Read", "MemorySave")
|
|
schema: dict # JSON schema sent to the LLM API
|
|
func: Callable # (params: dict, config: dict) -> str
|
|
read_only: bool # True = auto-approve in 'auto' permission mode
|
|
concurrent_safe: bool # True = safe to run in parallel (for sub-agents)
|
|
```
|
|
|
|
**Public API:**
|
|
|
|
| Function | Description |
|
|
|---|---|
|
|
| `register_tool(tool_def)` | Add a tool to the registry (overwrites by name) |
|
|
| `get_tool(name)` | Look up by name, returns `None` if not found |
|
|
| `get_all_tools()` | List all registered tools |
|
|
| `get_tool_schemas()` | Return schemas for API calls |
|
|
| `execute_tool(name, params, config, max_output=32000)` | Execute with output truncation |
|
|
| `clear_registry()` | Reset — for testing only |
|
|
|
|
**Output truncation:** If a tool returns more than `max_output` chars, the result is
|
|
truncated to `first_half + [... N chars truncated ...] + last_quarter`. This prevents
|
|
a single tool call (e.g. reading a huge file) from blowing up the context window.
|
|
|
|
**Registering a custom tool:**
|
|
|
|
```python
|
|
from tool_registry import ToolDef, register_tool
|
|
|
|
def my_tool(params, config):
|
|
return f"Hello, {params['name']}!"
|
|
|
|
register_tool(ToolDef(
|
|
name="MyTool",
|
|
schema={
|
|
"name": "MyTool",
|
|
"description": "A greeting tool",
|
|
"input_schema": {
|
|
"type": "object",
|
|
"properties": {"name": {"type": "string"}},
|
|
"required": ["name"],
|
|
},
|
|
},
|
|
func=my_tool,
|
|
read_only=True,
|
|
concurrent_safe=True,
|
|
))
|
|
```
|
|
|
|
### `tools.py` — Built-in Tool Implementations
|
|
|
|
Contains the 8 core tools (Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch)
|
|
plus memory tools (MemorySave, MemoryDelete) and sub-agent tools (Agent, CheckAgentResult,
|
|
ListAgentTasks). All register themselves via `tool_registry` at import time.
|
|
|
|
**Key internals:**
|
|
|
|
- `_is_safe_bash(cmd)` — whitelist of safe shell commands for auto-approval
|
|
- `generate_unified_diff(old, new, filename)` — diff generation for Edit/Write
|
|
- `maybe_truncate_diff(diff_text, max_lines=80)` — truncate large diffs for display
|
|
- `_get_agent_manager()` — lazy singleton for SubAgentManager
|
|
- Backward-compatible `execute_tool(name, inputs, permission_mode, ask_permission)` wrapper
|
|
|
|
### `agent.py` — Core Agent Loop
|
|
|
|
The heart of the system. `run()` is a generator that yields events as they happen.
|
|
|
|
```python
|
|
def run(user_message, state, config, system_prompt,
|
|
depth=0, cancel_check=None) -> Generator:
|
|
```
|
|
|
|
**Loop logic:**
|
|
|
|
```
|
|
1. Append user message
|
|
2. Inject depth into config (for sub-agent depth tracking)
|
|
3. While True:
|
|
a. Check cancel_check() — cooperative cancellation for sub-agents
|
|
b. maybe_compact(state, config) — compress if near context limit
|
|
c. Stream from provider → yield TextChunk / ThinkingChunk
|
|
d. Record assistant message
|
|
e. If no tool_calls → break
|
|
f. For each tool_call:
|
|
- Permission check (_check_permission)
|
|
- If denied → yield PermissionRequest → user decides
|
|
- Execute tool → yield ToolStart / ToolEnd
|
|
- Append tool result
|
|
g. Loop (model sees tool results and responds)
|
|
```
|
|
|
|
**Event types:**
|
|
|
|
| Event | Fields | When |
|
|
|---|---|---|
|
|
| `TextChunk` | `text` | Streaming text delta |
|
|
| `ThinkingChunk` | `text` | Extended thinking block |
|
|
| `ToolStart` | `name, inputs` | Before tool execution |
|
|
| `ToolEnd` | `name, result, permitted` | After tool execution |
|
|
| `PermissionRequest` | `description, granted` | Needs user approval |
|
|
| `TurnDone` | `input_tokens, output_tokens` | End of one API turn |
|
|
|
|
### `compaction.py` — Context Window Management
|
|
|
|
Keeps conversations within model context limits using two layers.
|
|
|
|
**Layer 1: Snip** (`snip_old_tool_results`)
|
|
- Rule-based, no API cost
|
|
- Truncates tool-role messages older than `preserve_last_n_turns` (default 6)
|
|
- Keeps first half + last quarter of the content
|
|
|
|
**Layer 2: Auto-Compact** (`compact_messages`)
|
|
- Model-driven: calls the current model to summarize old messages
|
|
- Splits messages into [old | recent] at ~70/30 ratio
|
|
- Replaces old messages with a summary + acknowledgment
|
|
|
|
**Trigger:** `maybe_compact()` checks `estimate_tokens(messages) > context_limit * 0.7`.
|
|
Runs snip first (cheap), then auto-compact if still over.
|
|
|
|
**Token estimation:** `len(content) / 3.5` — simple heuristic. Works for most models.
|
|
`get_context_limit(model)` reads from the provider registry.
|
|
|
|
### `memory.py` — Persistent Memory
|
|
|
|
File-based memory system stored in `~/.nano_claude/memory/`.
|
|
|
|
**Storage format:**
|
|
|
|
```
|
|
~/.nano_claude/memory/
|
|
├── MEMORY.md # Index: one line per memory
|
|
├── user_preferences.md # Individual memory file
|
|
└── project_auth.md
|
|
```
|
|
|
|
Each memory file uses markdown with YAML frontmatter:
|
|
|
|
```markdown
|
|
---
|
|
name: user preferences
|
|
description: coding style preferences
|
|
type: feedback
|
|
created: 2026-04-02
|
|
---
|
|
|
|
User prefers 4-space indentation and type hints.
|
|
```
|
|
|
|
**How it integrates:**
|
|
- `get_memory_context()` returns the MEMORY.md index text
|
|
- `context.py` injects this into the system prompt
|
|
- The model reads the index, then uses `Read` tool to access full memory content
|
|
- The model uses `MemorySave` / `MemoryDelete` tools to manage memories
|
|
|
|
### `subagent.py` — Threaded Sub-Agents
|
|
|
|
Sub-agents run in background threads via `ThreadPoolExecutor`.
|
|
|
|
**Key design decisions:**
|
|
|
|
1. **Fresh context** — each sub-agent starts with empty message history + task prompt
|
|
2. **Depth limiting** — `max_depth=3`, checked at spawn time. Model gets an error message
|
|
(not silent tool removal) so it can adapt.
|
|
3. **Cooperative cancellation** — `cancel_check` callable checked each loop iteration.
|
|
Python threads can't be killed safely, so we set a flag.
|
|
4. **Threading, not asyncio** — the entire codebase is synchronous generators. Threading
|
|
via `concurrent.futures` keeps things simple. The SubAgentManager API is designed to
|
|
be compatible with a future async migration.
|
|
|
|
**Lifecycle:**
|
|
|
|
```
|
|
spawn(prompt, config, system_prompt, depth)
|
|
→ Creates SubAgentTask
|
|
→ Submits _run to ThreadPoolExecutor
|
|
→ _run calls agent.run() with depth+1
|
|
|
|
wait(task_id, timeout) → blocks until complete
|
|
cancel(task_id) → sets _cancel_flag
|
|
get_result(task_id) → returns result string
|
|
```
|
|
|
|
### `skills.py` — Reusable Prompt Templates
|
|
|
|
Skills are markdown files with frontmatter. They are **not code** — just structured prompts
|
|
that get injected into the agent loop.
|
|
|
|
**Skill file format:**
|
|
|
|
```markdown
|
|
---
|
|
name: commit
|
|
description: Create a conventional commit
|
|
triggers: ["/commit"]
|
|
tools: [Bash, Read]
|
|
---
|
|
|
|
Your prompt instructions here...
|
|
```
|
|
|
|
**Execution:** `execute_skill()` wraps the skill prompt as a user message and calls
|
|
`agent.run()`. The skill runs through the exact same agent loop as a normal query.
|
|
|
|
**Search order:** Project-level (`./.nano_claude/skills/`) overrides user-level
|
|
(`~/.nano_claude/skills/`) when skill names collide.
|
|
|
|
### `providers.py` — Multi-Provider Abstraction
|
|
|
|
Two streaming adapters cover all providers:
|
|
|
|
| Adapter | Providers |
|
|
|---|---|
|
|
| `stream_anthropic()` | Anthropic (native SDK) |
|
|
| `stream_openai_compat()` | OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, Ollama, LM Studio, Custom |
|
|
|
|
**Neutral message format** (provider-independent):
|
|
|
|
```python
|
|
{"role": "user", "content": "..."}
|
|
{"role": "assistant", "content": "...", "tool_calls": [{"id": "...", "name": "...", "input": {...}}]}
|
|
{"role": "tool", "tool_call_id": "...", "name": "...", "content": "..."}
|
|
```
|
|
|
|
Conversion functions: `messages_to_anthropic()`, `messages_to_openai()`, `tools_to_openai()`.
|
|
|
|
**Provider-specific handling:**
|
|
- Gemini 3 models require `thought_signature` in tool call responses — this is transparently
|
|
captured and passed through via `extra_content` on tool_call dicts.
|
|
|
|
### `context.py` — System Prompt Builder
|
|
|
|
Assembles the system prompt from:
|
|
1. Base template (role, date, cwd, platform)
|
|
2. Git info (branch, status, recent commits)
|
|
3. CLAUDE.md content (project-level + global)
|
|
4. Memory index (from `memory.get_memory_context()`)
|
|
|
|
### `config.py` — Configuration
|
|
|
|
Defaults stored in `~/.nano_claude/config.json`. Key settings:
|
|
|
|
| Key | Default | Description |
|
|
|---|---|---|
|
|
| `model` | `claude-opus-4-6` | Active model |
|
|
| `max_tokens` | `8192` | Max output tokens |
|
|
| `permission_mode` | `auto` | Permission mode |
|
|
| `max_tool_output` | `32000` | Tool output truncation limit |
|
|
| `max_agent_depth` | `3` | Max sub-agent nesting |
|
|
| `max_concurrent_agents` | `3` | Thread pool size |
|
|
|
|
---
|
|
|
|
## Data Flow Example
|
|
|
|
A user asks "Read config.py and change max_tokens to 16384":
|
|
|
|
```
|
|
1. nano_claude.py captures input
|
|
2. agent.run() appends user message, calls maybe_compact()
|
|
3. providers.stream() sends to Gemini API with 13 tool schemas
|
|
4. Model responds: text + tool_call[Read(config.py)]
|
|
5. agent.py checks permission (Read = read_only → auto-approve)
|
|
6. tool_registry.execute_tool("Read", ...) → file content (truncated if >32K)
|
|
7. Tool result appended to messages, loop back to step 3
|
|
8. Model responds: text + tool_call[Edit(config.py, "8192", "16384")]
|
|
9. agent.py checks permission (Edit = not read_only → ask user)
|
|
10. User approves → tools.py._edit() runs, generates diff
|
|
11. nano_claude.py renders diff with ANSI colors (red/green)
|
|
12. Tool result appended, loop back to step 3
|
|
13. Model responds: "Done, max_tokens changed to 16384"
|
|
14. No tool_calls → loop ends, TurnDone yielded
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Run all 78 tests
|
|
python -m pytest tests/ -v
|
|
|
|
# Run specific module tests
|
|
python -m pytest tests/test_tool_registry.py -v
|
|
python -m pytest tests/test_compaction.py -v
|
|
python -m pytest tests/test_memory.py -v
|
|
python -m pytest tests/test_subagent.py -v
|
|
python -m pytest tests/test_skills.py -v
|
|
python -m pytest tests/test_diff_view.py -v
|
|
```
|
|
|
|
Tests use `monkeypatch` and `tmp_path` fixtures to avoid side effects.
|
|
Sub-agent tests mock `_agent_run` to avoid real API calls.
|
|
|
|
---
|
|
|
|
## Future: Package Refactoring
|
|
|
|
When `tools.py` or `agent.py` grow too large, the flat layout can be migrated to:
|
|
|
|
```
|
|
ncc/
|
|
├── __init__.py
|
|
├── repl.py # from nano_claude.py
|
|
├── agent/
|
|
│ ├── loop.py # from agent.py
|
|
│ ├── subagent.py # from subagent.py
|
|
│ └── compaction.py # from compaction.py
|
|
├── providers/
|
|
│ ├── base.py
|
|
│ ├── openai_compat.py
|
|
│ └── registry.py
|
|
├── tools/
|
|
│ ├── registry.py # from tool_registry.py
|
|
│ ├── builtin.py # core 8 tools from tools.py
|
|
│ ├── memory.py # MemorySave/MemoryDelete from tools.py
|
|
│ └── subagent.py # Agent/Check/List from tools.py
|
|
├── memory/
|
|
│ └── store.py # from memory.py
|
|
├── skills/
|
|
│ └── loader.py # from skills.py
|
|
└── config.py
|
|
```
|
|
|
|
The current code is structured to make this migration straightforward:
|
|
- Modules communicate via function parameters, not globals
|
|
- Each module has a small public API surface
|
|
- Dependencies are unidirectional
|