Files
chauncygu 1d4ffa964d Update README.MD and add nano-claude-code v3.0 + original-source-code/src
- README.MD: add original-source-code and nano-claude-code sections, update
  overview table (4 subprojects), add v3.0 news entry, expand comparison table
  with memory/multi-agent/skills dimensions
- nano-claude-code v3.0: multi-agent package (multi_agent/), memory package
  (memory/), skill package (skill/) with built-in /commit and /review skills,
  context compression (compaction.py), tool registry plugin system, diff view,
  17 slash commands, 18 built-in tools, 101 tests (~5000 lines total)
- original-source-code/src: add raw TypeScript source tree (1884 files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 10:26:29 -07:00

12 KiB

Architecture Guide

This document is for developers who want to understand, modify, or extend nano-claude-code. For user-facing docs, see README.md.


Overview

Nano-claude-code is a ~3.4K-line Python CLI that lets LLMs (GPT, Gemini, etc.) operate as coding agents with tool use, memory, sub-agents, and skills. The architecture is a flat module layout designed for readability and future migration to a package structure.

User Input
    │
    ▼
nano_claude.py  ── REPL, slash commands, rendering
    │
    ├──► agent.py  ── multi-turn loop, permission gates
    │       │
    │       ├──► providers.py  ── API streaming (Anthropic / OpenAI-compat)
    │       ├──► tool_registry.py ──► tools.py  ── 13 tools
    │       ├──► compaction.py  ── context window management
    │       └──► subagent.py  ── threaded sub-agent lifecycle
    │
    ├──► context.py  ── system prompt (git, CLAUDE.md, memory)
    │       └──► memory.py  ── persistent file-based memory
    │
    ├──► skills.py  ── markdown skill loading + execution
    └──► config.py  ── configuration persistence

Key invariant: Dependencies flow downward. No circular imports at the module level (subagent.py uses lazy imports to call agent.py).


Module Reference

tool_registry.py — Tool Plugin System

The central registry that all tools register into. This is the foundation for extensibility.

Data model:

@dataclass
class ToolDef:
    name: str               # unique identifier (e.g. "Read", "MemorySave")
    schema: dict            # JSON schema sent to the LLM API
    func: Callable          # (params: dict, config: dict) -> str
    read_only: bool         # True = auto-approve in 'auto' permission mode
    concurrent_safe: bool   # True = safe to run in parallel (for sub-agents)

Public API:

Function Description
register_tool(tool_def) Add a tool to the registry (overwrites by name)
get_tool(name) Look up by name, returns None if not found
get_all_tools() List all registered tools
get_tool_schemas() Return schemas for API calls
execute_tool(name, params, config, max_output=32000) Execute with output truncation
clear_registry() Reset — for testing only

Output truncation: If a tool returns more than max_output chars, the result is truncated to first_half + [... N chars truncated ...] + last_quarter. This prevents a single tool call (e.g. reading a huge file) from blowing up the context window.

Registering a custom tool:

from tool_registry import ToolDef, register_tool

def my_tool(params, config):
    return f"Hello, {params['name']}!"

register_tool(ToolDef(
    name="MyTool",
    schema={
        "name": "MyTool",
        "description": "A greeting tool",
        "input_schema": {
            "type": "object",
            "properties": {"name": {"type": "string"}},
            "required": ["name"],
        },
    },
    func=my_tool,
    read_only=True,
    concurrent_safe=True,
))

tools.py — Built-in Tool Implementations

Contains the 8 core tools (Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch) plus memory tools (MemorySave, MemoryDelete) and sub-agent tools (Agent, CheckAgentResult, ListAgentTasks). All register themselves via tool_registry at import time.

Key internals:

  • _is_safe_bash(cmd) — whitelist of safe shell commands for auto-approval
  • generate_unified_diff(old, new, filename) — diff generation for Edit/Write
  • maybe_truncate_diff(diff_text, max_lines=80) — truncate large diffs for display
  • _get_agent_manager() — lazy singleton for SubAgentManager
  • Backward-compatible execute_tool(name, inputs, permission_mode, ask_permission) wrapper

agent.py — Core Agent Loop

The heart of the system. run() is a generator that yields events as they happen.

def run(user_message, state, config, system_prompt,
        depth=0, cancel_check=None) -> Generator:

Loop logic:

1. Append user message
2. Inject depth into config (for sub-agent depth tracking)
3. While True:
   a. Check cancel_check() — cooperative cancellation for sub-agents
   b. maybe_compact(state, config) — compress if near context limit
   c. Stream from provider → yield TextChunk / ThinkingChunk
   d. Record assistant message
   e. If no tool_calls → break
   f. For each tool_call:
      - Permission check (_check_permission)
      - If denied → yield PermissionRequest → user decides
      - Execute tool → yield ToolStart / ToolEnd
      - Append tool result
   g. Loop (model sees tool results and responds)

Event types:

Event Fields When
TextChunk text Streaming text delta
ThinkingChunk text Extended thinking block
ToolStart name, inputs Before tool execution
ToolEnd name, result, permitted After tool execution
PermissionRequest description, granted Needs user approval
TurnDone input_tokens, output_tokens End of one API turn

compaction.py — Context Window Management

Keeps conversations within model context limits using two layers.

Layer 1: Snip (snip_old_tool_results)

  • Rule-based, no API cost
  • Truncates tool-role messages older than preserve_last_n_turns (default 6)
  • Keeps first half + last quarter of the content

Layer 2: Auto-Compact (compact_messages)

  • Model-driven: calls the current model to summarize old messages
  • Splits messages into [old | recent] at ~70/30 ratio
  • Replaces old messages with a summary + acknowledgment

Trigger: maybe_compact() checks estimate_tokens(messages) > context_limit * 0.7. Runs snip first (cheap), then auto-compact if still over.

Token estimation: len(content) / 3.5 — simple heuristic. Works for most models. get_context_limit(model) reads from the provider registry.

memory.py — Persistent Memory

File-based memory system stored in ~/.nano_claude/memory/.

Storage format:

~/.nano_claude/memory/
├── MEMORY.md              # Index: one line per memory
├── user_preferences.md    # Individual memory file
└── project_auth.md

Each memory file uses markdown with YAML frontmatter:

---
name: user preferences
description: coding style preferences
type: feedback
created: 2026-04-02
---

User prefers 4-space indentation and type hints.

How it integrates:

  • get_memory_context() returns the MEMORY.md index text
  • context.py injects this into the system prompt
  • The model reads the index, then uses Read tool to access full memory content
  • The model uses MemorySave / MemoryDelete tools to manage memories

subagent.py — Threaded Sub-Agents

Sub-agents run in background threads via ThreadPoolExecutor.

Key design decisions:

  1. Fresh context — each sub-agent starts with empty message history + task prompt
  2. Depth limitingmax_depth=3, checked at spawn time. Model gets an error message (not silent tool removal) so it can adapt.
  3. Cooperative cancellationcancel_check callable checked each loop iteration. Python threads can't be killed safely, so we set a flag.
  4. Threading, not asyncio — the entire codebase is synchronous generators. Threading via concurrent.futures keeps things simple. The SubAgentManager API is designed to be compatible with a future async migration.

Lifecycle:

spawn(prompt, config, system_prompt, depth)
  → Creates SubAgentTask
  → Submits _run to ThreadPoolExecutor
  → _run calls agent.run() with depth+1

wait(task_id, timeout)  → blocks until complete
cancel(task_id)         → sets _cancel_flag
get_result(task_id)     → returns result string

skills.py — Reusable Prompt Templates

Skills are markdown files with frontmatter. They are not code — just structured prompts that get injected into the agent loop.

Skill file format:

---
name: commit
description: Create a conventional commit
triggers: ["/commit"]
tools: [Bash, Read]
---

Your prompt instructions here...

Execution: execute_skill() wraps the skill prompt as a user message and calls agent.run(). The skill runs through the exact same agent loop as a normal query.

Search order: Project-level (./.nano_claude/skills/) overrides user-level (~/.nano_claude/skills/) when skill names collide.

providers.py — Multi-Provider Abstraction

Two streaming adapters cover all providers:

Adapter Providers
stream_anthropic() Anthropic (native SDK)
stream_openai_compat() OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, Ollama, LM Studio, Custom

Neutral message format (provider-independent):

{"role": "user", "content": "..."}
{"role": "assistant", "content": "...", "tool_calls": [{"id": "...", "name": "...", "input": {...}}]}
{"role": "tool", "tool_call_id": "...", "name": "...", "content": "..."}

Conversion functions: messages_to_anthropic(), messages_to_openai(), tools_to_openai().

Provider-specific handling:

  • Gemini 3 models require thought_signature in tool call responses — this is transparently captured and passed through via extra_content on tool_call dicts.

context.py — System Prompt Builder

Assembles the system prompt from:

  1. Base template (role, date, cwd, platform)
  2. Git info (branch, status, recent commits)
  3. CLAUDE.md content (project-level + global)
  4. Memory index (from memory.get_memory_context())

config.py — Configuration

Defaults stored in ~/.nano_claude/config.json. Key settings:

Key Default Description
model claude-opus-4-6 Active model
max_tokens 8192 Max output tokens
permission_mode auto Permission mode
max_tool_output 32000 Tool output truncation limit
max_agent_depth 3 Max sub-agent nesting
max_concurrent_agents 3 Thread pool size

Data Flow Example

A user asks "Read config.py and change max_tokens to 16384":

1. nano_claude.py captures input
2. agent.run() appends user message, calls maybe_compact()
3. providers.stream() sends to Gemini API with 13 tool schemas
4. Model responds: text + tool_call[Read(config.py)]
5. agent.py checks permission (Read = read_only → auto-approve)
6. tool_registry.execute_tool("Read", ...) → file content (truncated if >32K)
7. Tool result appended to messages, loop back to step 3
8. Model responds: text + tool_call[Edit(config.py, "8192", "16384")]
9. agent.py checks permission (Edit = not read_only → ask user)
10. User approves → tools.py._edit() runs, generates diff
11. nano_claude.py renders diff with ANSI colors (red/green)
12. Tool result appended, loop back to step 3
13. Model responds: "Done, max_tokens changed to 16384"
14. No tool_calls → loop ends, TurnDone yielded

Testing

# Run all 78 tests
python -m pytest tests/ -v

# Run specific module tests
python -m pytest tests/test_tool_registry.py -v
python -m pytest tests/test_compaction.py -v
python -m pytest tests/test_memory.py -v
python -m pytest tests/test_subagent.py -v
python -m pytest tests/test_skills.py -v
python -m pytest tests/test_diff_view.py -v

Tests use monkeypatch and tmp_path fixtures to avoid side effects. Sub-agent tests mock _agent_run to avoid real API calls.


Future: Package Refactoring

When tools.py or agent.py grow too large, the flat layout can be migrated to:

ncc/
├── __init__.py
├── repl.py              # from nano_claude.py
├── agent/
│   ├── loop.py          # from agent.py
│   ├── subagent.py      # from subagent.py
│   └── compaction.py    # from compaction.py
├── providers/
│   ├── base.py
│   ├── openai_compat.py
│   └── registry.py
├── tools/
│   ├── registry.py      # from tool_registry.py
│   ├── builtin.py       # core 8 tools from tools.py
│   ├── memory.py        # MemorySave/MemoryDelete from tools.py
│   └── subagent.py      # Agent/Check/List from tools.py
├── memory/
│   └── store.py         # from memory.py
├── skills/
│   └── loader.py        # from skills.py
└── config.py

The current code is structured to make this migration straightforward:

  • Modules communicate via function parameters, not globals
  • Each module has a small public API surface
  • Dependencies are unidirectional