- README.MD: add original-source-code and nano-claude-code sections, update overview table (4 subprojects), add v3.0 news entry, expand comparison table with memory/multi-agent/skills dimensions - nano-claude-code v3.0: multi-agent package (multi_agent/), memory package (memory/), skill package (skill/) with built-in /commit and /review skills, context compression (compaction.py), tool registry plugin system, diff view, 17 slash commands, 18 built-in tools, 101 tests (~5000 lines total) - original-source-code/src: add raw TypeScript source tree (1884 files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
Architecture Guide
This document is for developers who want to understand, modify, or extend nano-claude-code. For user-facing docs, see README.md.
Overview
Nano-claude-code is a ~3.4K-line Python CLI that lets LLMs (GPT, Gemini, etc.) operate as coding agents with tool use, memory, sub-agents, and skills. The architecture is a flat module layout designed for readability and future migration to a package structure.
User Input
│
▼
nano_claude.py ── REPL, slash commands, rendering
│
├──► agent.py ── multi-turn loop, permission gates
│ │
│ ├──► providers.py ── API streaming (Anthropic / OpenAI-compat)
│ ├──► tool_registry.py ──► tools.py ── 13 tools
│ ├──► compaction.py ── context window management
│ └──► subagent.py ── threaded sub-agent lifecycle
│
├──► context.py ── system prompt (git, CLAUDE.md, memory)
│ └──► memory.py ── persistent file-based memory
│
├──► skills.py ── markdown skill loading + execution
└──► config.py ── configuration persistence
Key invariant: Dependencies flow downward. No circular imports at the module level (subagent.py uses lazy imports to call agent.py).
Module Reference
tool_registry.py — Tool Plugin System
The central registry that all tools register into. This is the foundation for extensibility.
Data model:
@dataclass
class ToolDef:
name: str # unique identifier (e.g. "Read", "MemorySave")
schema: dict # JSON schema sent to the LLM API
func: Callable # (params: dict, config: dict) -> str
read_only: bool # True = auto-approve in 'auto' permission mode
concurrent_safe: bool # True = safe to run in parallel (for sub-agents)
Public API:
| Function | Description |
|---|---|
register_tool(tool_def) |
Add a tool to the registry (overwrites by name) |
get_tool(name) |
Look up by name, returns None if not found |
get_all_tools() |
List all registered tools |
get_tool_schemas() |
Return schemas for API calls |
execute_tool(name, params, config, max_output=32000) |
Execute with output truncation |
clear_registry() |
Reset — for testing only |
Output truncation: If a tool returns more than max_output chars, the result is
truncated to first_half + [... N chars truncated ...] + last_quarter. This prevents
a single tool call (e.g. reading a huge file) from blowing up the context window.
Registering a custom tool:
from tool_registry import ToolDef, register_tool
def my_tool(params, config):
return f"Hello, {params['name']}!"
register_tool(ToolDef(
name="MyTool",
schema={
"name": "MyTool",
"description": "A greeting tool",
"input_schema": {
"type": "object",
"properties": {"name": {"type": "string"}},
"required": ["name"],
},
},
func=my_tool,
read_only=True,
concurrent_safe=True,
))
tools.py — Built-in Tool Implementations
Contains the 8 core tools (Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch)
plus memory tools (MemorySave, MemoryDelete) and sub-agent tools (Agent, CheckAgentResult,
ListAgentTasks). All register themselves via tool_registry at import time.
Key internals:
_is_safe_bash(cmd)— whitelist of safe shell commands for auto-approvalgenerate_unified_diff(old, new, filename)— diff generation for Edit/Writemaybe_truncate_diff(diff_text, max_lines=80)— truncate large diffs for display_get_agent_manager()— lazy singleton for SubAgentManager- Backward-compatible
execute_tool(name, inputs, permission_mode, ask_permission)wrapper
agent.py — Core Agent Loop
The heart of the system. run() is a generator that yields events as they happen.
def run(user_message, state, config, system_prompt,
depth=0, cancel_check=None) -> Generator:
Loop logic:
1. Append user message
2. Inject depth into config (for sub-agent depth tracking)
3. While True:
a. Check cancel_check() — cooperative cancellation for sub-agents
b. maybe_compact(state, config) — compress if near context limit
c. Stream from provider → yield TextChunk / ThinkingChunk
d. Record assistant message
e. If no tool_calls → break
f. For each tool_call:
- Permission check (_check_permission)
- If denied → yield PermissionRequest → user decides
- Execute tool → yield ToolStart / ToolEnd
- Append tool result
g. Loop (model sees tool results and responds)
Event types:
| Event | Fields | When |
|---|---|---|
TextChunk |
text |
Streaming text delta |
ThinkingChunk |
text |
Extended thinking block |
ToolStart |
name, inputs |
Before tool execution |
ToolEnd |
name, result, permitted |
After tool execution |
PermissionRequest |
description, granted |
Needs user approval |
TurnDone |
input_tokens, output_tokens |
End of one API turn |
compaction.py — Context Window Management
Keeps conversations within model context limits using two layers.
Layer 1: Snip (snip_old_tool_results)
- Rule-based, no API cost
- Truncates tool-role messages older than
preserve_last_n_turns(default 6) - Keeps first half + last quarter of the content
Layer 2: Auto-Compact (compact_messages)
- Model-driven: calls the current model to summarize old messages
- Splits messages into [old | recent] at ~70/30 ratio
- Replaces old messages with a summary + acknowledgment
Trigger: maybe_compact() checks estimate_tokens(messages) > context_limit * 0.7.
Runs snip first (cheap), then auto-compact if still over.
Token estimation: len(content) / 3.5 — simple heuristic. Works for most models.
get_context_limit(model) reads from the provider registry.
memory.py — Persistent Memory
File-based memory system stored in ~/.nano_claude/memory/.
Storage format:
~/.nano_claude/memory/
├── MEMORY.md # Index: one line per memory
├── user_preferences.md # Individual memory file
└── project_auth.md
Each memory file uses markdown with YAML frontmatter:
---
name: user preferences
description: coding style preferences
type: feedback
created: 2026-04-02
---
User prefers 4-space indentation and type hints.
How it integrates:
get_memory_context()returns the MEMORY.md index textcontext.pyinjects this into the system prompt- The model reads the index, then uses
Readtool to access full memory content - The model uses
MemorySave/MemoryDeletetools to manage memories
subagent.py — Threaded Sub-Agents
Sub-agents run in background threads via ThreadPoolExecutor.
Key design decisions:
- Fresh context — each sub-agent starts with empty message history + task prompt
- Depth limiting —
max_depth=3, checked at spawn time. Model gets an error message (not silent tool removal) so it can adapt. - Cooperative cancellation —
cancel_checkcallable checked each loop iteration. Python threads can't be killed safely, so we set a flag. - Threading, not asyncio — the entire codebase is synchronous generators. Threading
via
concurrent.futureskeeps things simple. The SubAgentManager API is designed to be compatible with a future async migration.
Lifecycle:
spawn(prompt, config, system_prompt, depth)
→ Creates SubAgentTask
→ Submits _run to ThreadPoolExecutor
→ _run calls agent.run() with depth+1
wait(task_id, timeout) → blocks until complete
cancel(task_id) → sets _cancel_flag
get_result(task_id) → returns result string
skills.py — Reusable Prompt Templates
Skills are markdown files with frontmatter. They are not code — just structured prompts that get injected into the agent loop.
Skill file format:
---
name: commit
description: Create a conventional commit
triggers: ["/commit"]
tools: [Bash, Read]
---
Your prompt instructions here...
Execution: execute_skill() wraps the skill prompt as a user message and calls
agent.run(). The skill runs through the exact same agent loop as a normal query.
Search order: Project-level (./.nano_claude/skills/) overrides user-level
(~/.nano_claude/skills/) when skill names collide.
providers.py — Multi-Provider Abstraction
Two streaming adapters cover all providers:
| Adapter | Providers |
|---|---|
stream_anthropic() |
Anthropic (native SDK) |
stream_openai_compat() |
OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, Ollama, LM Studio, Custom |
Neutral message format (provider-independent):
{"role": "user", "content": "..."}
{"role": "assistant", "content": "...", "tool_calls": [{"id": "...", "name": "...", "input": {...}}]}
{"role": "tool", "tool_call_id": "...", "name": "...", "content": "..."}
Conversion functions: messages_to_anthropic(), messages_to_openai(), tools_to_openai().
Provider-specific handling:
- Gemini 3 models require
thought_signaturein tool call responses — this is transparently captured and passed through viaextra_contenton tool_call dicts.
context.py — System Prompt Builder
Assembles the system prompt from:
- Base template (role, date, cwd, platform)
- Git info (branch, status, recent commits)
- CLAUDE.md content (project-level + global)
- Memory index (from
memory.get_memory_context())
config.py — Configuration
Defaults stored in ~/.nano_claude/config.json. Key settings:
| Key | Default | Description |
|---|---|---|
model |
claude-opus-4-6 |
Active model |
max_tokens |
8192 |
Max output tokens |
permission_mode |
auto |
Permission mode |
max_tool_output |
32000 |
Tool output truncation limit |
max_agent_depth |
3 |
Max sub-agent nesting |
max_concurrent_agents |
3 |
Thread pool size |
Data Flow Example
A user asks "Read config.py and change max_tokens to 16384":
1. nano_claude.py captures input
2. agent.run() appends user message, calls maybe_compact()
3. providers.stream() sends to Gemini API with 13 tool schemas
4. Model responds: text + tool_call[Read(config.py)]
5. agent.py checks permission (Read = read_only → auto-approve)
6. tool_registry.execute_tool("Read", ...) → file content (truncated if >32K)
7. Tool result appended to messages, loop back to step 3
8. Model responds: text + tool_call[Edit(config.py, "8192", "16384")]
9. agent.py checks permission (Edit = not read_only → ask user)
10. User approves → tools.py._edit() runs, generates diff
11. nano_claude.py renders diff with ANSI colors (red/green)
12. Tool result appended, loop back to step 3
13. Model responds: "Done, max_tokens changed to 16384"
14. No tool_calls → loop ends, TurnDone yielded
Testing
# Run all 78 tests
python -m pytest tests/ -v
# Run specific module tests
python -m pytest tests/test_tool_registry.py -v
python -m pytest tests/test_compaction.py -v
python -m pytest tests/test_memory.py -v
python -m pytest tests/test_subagent.py -v
python -m pytest tests/test_skills.py -v
python -m pytest tests/test_diff_view.py -v
Tests use monkeypatch and tmp_path fixtures to avoid side effects.
Sub-agent tests mock _agent_run to avoid real API calls.
Future: Package Refactoring
When tools.py or agent.py grow too large, the flat layout can be migrated to:
ncc/
├── __init__.py
├── repl.py # from nano_claude.py
├── agent/
│ ├── loop.py # from agent.py
│ ├── subagent.py # from subagent.py
│ └── compaction.py # from compaction.py
├── providers/
│ ├── base.py
│ ├── openai_compat.py
│ └── registry.py
├── tools/
│ ├── registry.py # from tool_registry.py
│ ├── builtin.py # core 8 tools from tools.py
│ ├── memory.py # MemorySave/MemoryDelete from tools.py
│ └── subagent.py # Agent/Check/List from tools.py
├── memory/
│ └── store.py # from memory.py
├── skills/
│ └── loader.py # from skills.py
└── config.py
The current code is structured to make this migration straightforward:
- Modules communicate via function parameters, not globals
- Each module has a small public API surface
- Dependencies are unidirectional