Update README.MD and add nano-claude-code v3.0 + original-source-code/src

- README.MD: add original-source-code and nano-claude-code sections, update overview table (4 subprojects), add v3.0 news entry, expand comparison table with memory/multi-agent/skills dimensions - nano-claude-code v3.0: multi-agent package (multi_agent/), memory package (memory/), skill package (skill/) with built-in /commit and /review skills, context compression (compaction.py), tool registry plugin system, diff view, 17 slash commands, 18 built-in tools, 101 tests (~5000 lines total) - original-source-code/src: add raw TypeScript source tree (1884 files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 10:26:29 -07:00
parent 3de4c595ea
commit 1d4ffa964d
1942 changed files with 521644 additions and 112 deletions
--- a/nano-claude-code/docs/architecture.md
+++ b/nano-claude-code/docs/architecture.md
@@ -0,0 +1,374 @@
+# Architecture Guide
+
+This document is for developers who want to understand, modify, or extend nano-claude-code.
+For user-facing docs, see [README.md](../README.md).
+
+---
+
+## Overview
+
+Nano-claude-code is a ~3.4K-line Python CLI that lets LLMs (GPT, Gemini, etc.) operate as
+coding agents with tool use, memory, sub-agents, and skills. The architecture is a flat
+module layout designed for readability and future migration to a package structure.
+
+```
+User Input
+    │
+    ▼
+nano_claude.py  ── REPL, slash commands, rendering
+    │
+    ├──► agent.py  ── multi-turn loop, permission gates
+    │       │
+    │       ├──► providers.py  ── API streaming (Anthropic / OpenAI-compat)
+    │       ├──► tool_registry.py ──► tools.py  ── 13 tools
+    │       ├──► compaction.py  ── context window management
+    │       └──► subagent.py  ── threaded sub-agent lifecycle
+    │
+    ├──► context.py  ── system prompt (git, CLAUDE.md, memory)
+    │       └──► memory.py  ── persistent file-based memory
+    │
+    ├──► skills.py  ── markdown skill loading + execution
+    └──► config.py  ── configuration persistence
+```
+
+**Key invariant:** Dependencies flow downward. No circular imports at the module level
+(subagent.py uses lazy imports to call agent.py).
+
+---
+
+## Module Reference
+
+### `tool_registry.py` — Tool Plugin System
+
+The central registry that all tools register into. This is the foundation for extensibility.
+
+**Data model:**
+
+```python
+@dataclass
+class ToolDef:
+    name: str               # unique identifier (e.g. "Read", "MemorySave")
+    schema: dict            # JSON schema sent to the LLM API
+    func: Callable          # (params: dict, config: dict) -> str
+    read_only: bool         # True = auto-approve in 'auto' permission mode
+    concurrent_safe: bool   # True = safe to run in parallel (for sub-agents)
+```
+
+**Public API:**
+
+| Function | Description |
+|---|---|
+| `register_tool(tool_def)` | Add a tool to the registry (overwrites by name) |
+| `get_tool(name)` | Look up by name, returns `None` if not found |
+| `get_all_tools()` | List all registered tools |
+| `get_tool_schemas()` | Return schemas for API calls |
+| `execute_tool(name, params, config, max_output=32000)` | Execute with output truncation |
+| `clear_registry()` | Reset — for testing only |
+
+**Output truncation:** If a tool returns more than `max_output` chars, the result is
+truncated to `first_half + [... N chars truncated ...] + last_quarter`. This prevents
+a single tool call (e.g. reading a huge file) from blowing up the context window.
+
+**Registering a custom tool:**
+
+```python
+from tool_registry import ToolDef, register_tool
+
+def my_tool(params, config):
+    return f"Hello, {params['name']}!"
+
+register_tool(ToolDef(
+    name="MyTool",
+    schema={
+        "name": "MyTool",
+        "description": "A greeting tool",
+        "input_schema": {
+            "type": "object",
+            "properties": {"name": {"type": "string"}},
+            "required": ["name"],
+        },
+    },
+    func=my_tool,
+    read_only=True,
+    concurrent_safe=True,
+))
+```
+
+### `tools.py` — Built-in Tool Implementations
+
+Contains the 8 core tools (Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch)
+plus memory tools (MemorySave, MemoryDelete) and sub-agent tools (Agent, CheckAgentResult,
+ListAgentTasks). All register themselves via `tool_registry` at import time.
+
+**Key internals:**
+
+- `_is_safe_bash(cmd)` — whitelist of safe shell commands for auto-approval
+- `generate_unified_diff(old, new, filename)` — diff generation for Edit/Write
+- `maybe_truncate_diff(diff_text, max_lines=80)` — truncate large diffs for display
+- `_get_agent_manager()` — lazy singleton for SubAgentManager
+- Backward-compatible `execute_tool(name, inputs, permission_mode, ask_permission)` wrapper
+
+### `agent.py` — Core Agent Loop
+
+The heart of the system. `run()` is a generator that yields events as they happen.
+
+```python
+def run(user_message, state, config, system_prompt,
+        depth=0, cancel_check=None) -> Generator:
+```
+
+**Loop logic:**
+
+```
+1. Append user message
+2. Inject depth into config (for sub-agent depth tracking)
+3. While True:
+   a. Check cancel_check() — cooperative cancellation for sub-agents
+   b. maybe_compact(state, config) — compress if near context limit
+   c. Stream from provider → yield TextChunk / ThinkingChunk
+   d. Record assistant message
+   e. If no tool_calls → break
+   f. For each tool_call:
+      - Permission check (_check_permission)
+      - If denied → yield PermissionRequest → user decides
+      - Execute tool → yield ToolStart / ToolEnd
+      - Append tool result
+   g. Loop (model sees tool results and responds)
+```
+
+**Event types:**
+
+| Event | Fields | When |
+|---|---|---|
+| `TextChunk` | `text` | Streaming text delta |
+| `ThinkingChunk` | `text` | Extended thinking block |
+| `ToolStart` | `name, inputs` | Before tool execution |
+| `ToolEnd` | `name, result, permitted` | After tool execution |
+| `PermissionRequest` | `description, granted` | Needs user approval |
+| `TurnDone` | `input_tokens, output_tokens` | End of one API turn |
+
+### `compaction.py` — Context Window Management
+
+Keeps conversations within model context limits using two layers.
+
+**Layer 1: Snip** (`snip_old_tool_results`)
+- Rule-based, no API cost
+- Truncates tool-role messages older than `preserve_last_n_turns` (default 6)
+- Keeps first half + last quarter of the content
+
+**Layer 2: Auto-Compact** (`compact_messages`)
+- Model-driven: calls the current model to summarize old messages
+- Splits messages into [old | recent] at ~70/30 ratio
+- Replaces old messages with a summary + acknowledgment
+
+**Trigger:** `maybe_compact()` checks `estimate_tokens(messages) > context_limit * 0.7`.
+Runs snip first (cheap), then auto-compact if still over.
+
+**Token estimation:** `len(content) / 3.5` — simple heuristic. Works for most models.
+`get_context_limit(model)` reads from the provider registry.
+
+### `memory.py` — Persistent Memory
+
+File-based memory system stored in `~/.nano_claude/memory/`.
+
+**Storage format:**
+
+```
+~/.nano_claude/memory/
+├── MEMORY.md              # Index: one line per memory
+├── user_preferences.md    # Individual memory file
+└── project_auth.md
+```
+
+Each memory file uses markdown with YAML frontmatter:
+
+```markdown
+---
+name: user preferences
+description: coding style preferences
+type: feedback
+created: 2026-04-02
+---
+
+User prefers 4-space indentation and type hints.
+```
+
+**How it integrates:**
+- `get_memory_context()` returns the MEMORY.md index text
+- `context.py` injects this into the system prompt
+- The model reads the index, then uses `Read` tool to access full memory content
+- The model uses `MemorySave` / `MemoryDelete` tools to manage memories
+
+### `subagent.py` — Threaded Sub-Agents
+
+Sub-agents run in background threads via `ThreadPoolExecutor`.
+
+**Key design decisions:**
+
+1. **Fresh context** — each sub-agent starts with empty message history + task prompt
+2. **Depth limiting** — `max_depth=3`, checked at spawn time. Model gets an error message
+   (not silent tool removal) so it can adapt.
+3. **Cooperative cancellation** — `cancel_check` callable checked each loop iteration.
+   Python threads can't be killed safely, so we set a flag.
+4. **Threading, not asyncio** — the entire codebase is synchronous generators. Threading
+   via `concurrent.futures` keeps things simple. The SubAgentManager API is designed to
+   be compatible with a future async migration.
+
+**Lifecycle:**
+
+```
+spawn(prompt, config, system_prompt, depth)
+  → Creates SubAgentTask
+  → Submits _run to ThreadPoolExecutor
+  → _run calls agent.run() with depth+1
+
+wait(task_id, timeout)  → blocks until complete
+cancel(task_id)         → sets _cancel_flag
+get_result(task_id)     → returns result string
+```
+
+### `skills.py` — Reusable Prompt Templates
+
+Skills are markdown files with frontmatter. They are **not code** — just structured prompts
+that get injected into the agent loop.
+
+**Skill file format:**
+
+```markdown
+---
+name: commit
+description: Create a conventional commit
+triggers: ["/commit"]
+tools: [Bash, Read]
+---
+
+Your prompt instructions here...
+```
+
+**Execution:** `execute_skill()` wraps the skill prompt as a user message and calls
+`agent.run()`. The skill runs through the exact same agent loop as a normal query.
+
+**Search order:** Project-level (`./.nano_claude/skills/`) overrides user-level
+(`~/.nano_claude/skills/`) when skill names collide.
+
+### `providers.py` — Multi-Provider Abstraction
+
+Two streaming adapters cover all providers:
+
+| Adapter | Providers |
+|---|---|
+| `stream_anthropic()` | Anthropic (native SDK) |
+| `stream_openai_compat()` | OpenAI, Gemini, Kimi, Qwen, Zhipu, DeepSeek, Ollama, LM Studio, Custom |
+
+**Neutral message format** (provider-independent):
+
+```python
+{"role": "user", "content": "..."}
+{"role": "assistant", "content": "...", "tool_calls": [{"id": "...", "name": "...", "input": {...}}]}
+{"role": "tool", "tool_call_id": "...", "name": "...", "content": "..."}
+```
+
+Conversion functions: `messages_to_anthropic()`, `messages_to_openai()`, `tools_to_openai()`.
+
+**Provider-specific handling:**
+- Gemini 3 models require `thought_signature` in tool call responses — this is transparently
+  captured and passed through via `extra_content` on tool_call dicts.
+
+### `context.py` — System Prompt Builder
+
+Assembles the system prompt from:
+1. Base template (role, date, cwd, platform)
+2. Git info (branch, status, recent commits)
+3. CLAUDE.md content (project-level + global)
+4. Memory index (from `memory.get_memory_context()`)
+
+### `config.py` — Configuration
+
+Defaults stored in `~/.nano_claude/config.json`. Key settings:
+
+| Key | Default | Description |
+|---|---|---|
+| `model` | `claude-opus-4-6` | Active model |
+| `max_tokens` | `8192` | Max output tokens |
+| `permission_mode` | `auto` | Permission mode |
+| `max_tool_output` | `32000` | Tool output truncation limit |
+| `max_agent_depth` | `3` | Max sub-agent nesting |
+| `max_concurrent_agents` | `3` | Thread pool size |
+
+---
+
+## Data Flow Example
+
+A user asks "Read config.py and change max_tokens to 16384":
+
+```
+1. nano_claude.py captures input
+2. agent.run() appends user message, calls maybe_compact()
+3. providers.stream() sends to Gemini API with 13 tool schemas
+4. Model responds: text + tool_call[Read(config.py)]
+5. agent.py checks permission (Read = read_only → auto-approve)
+6. tool_registry.execute_tool("Read", ...) → file content (truncated if >32K)
+7. Tool result appended to messages, loop back to step 3
+8. Model responds: text + tool_call[Edit(config.py, "8192", "16384")]
+9. agent.py checks permission (Edit = not read_only → ask user)
+10. User approves → tools.py._edit() runs, generates diff
+11. nano_claude.py renders diff with ANSI colors (red/green)
+12. Tool result appended, loop back to step 3
+13. Model responds: "Done, max_tokens changed to 16384"
+14. No tool_calls → loop ends, TurnDone yielded
+```
+
+---
+
+## Testing
+
+```bash
+# Run all 78 tests
+python -m pytest tests/ -v
+
+# Run specific module tests
+python -m pytest tests/test_tool_registry.py -v
+python -m pytest tests/test_compaction.py -v
+python -m pytest tests/test_memory.py -v
+python -m pytest tests/test_subagent.py -v
+python -m pytest tests/test_skills.py -v
+python -m pytest tests/test_diff_view.py -v
+```
+
+Tests use `monkeypatch` and `tmp_path` fixtures to avoid side effects.
+Sub-agent tests mock `_agent_run` to avoid real API calls.
+
+---
+
+## Future: Package Refactoring
+
+When `tools.py` or `agent.py` grow too large, the flat layout can be migrated to:
+
+```
+ncc/
+├── __init__.py
+├── repl.py              # from nano_claude.py
+├── agent/
+│   ├── loop.py          # from agent.py
+│   ├── subagent.py      # from subagent.py
+│   └── compaction.py    # from compaction.py
+├── providers/
+│   ├── base.py
+│   ├── openai_compat.py
+│   └── registry.py
+├── tools/
+│   ├── registry.py      # from tool_registry.py
+│   ├── builtin.py       # core 8 tools from tools.py
+│   ├── memory.py        # MemorySave/MemoryDelete from tools.py
+│   └── subagent.py      # Agent/Check/List from tools.py
+├── memory/
+│   └── store.py         # from memory.py
+├── skills/
+│   └── loader.py        # from skills.py
+└── config.py
+```
+
+The current code is structured to make this migration straightforward:
+- Modules communicate via function parameters, not globals
+- Each module has a small public API surface
+- Dependencies are unidirectional
--- a/nano-claude-code/docs/logo-v1.png
+++ b/nano-claude-code/docs/logo-v1.png
--- a/nano-claude-code/docs/superpowers/plans/2026-04-02-open-cc-enhancement.md
+++ b/nano-claude-code/docs/superpowers/plans/2026-04-02-open-cc-enhancement.md
--- a/nano-claude-code/docs/superpowers/specs/2026-04-02-open-cc-design.md
+++ b/nano-claude-code/docs/superpowers/specs/2026-04-02-open-cc-design.md
@@ -0,0 +1,643 @@
+# Open-CC: Nano Claude Code Enhancement Design
+
+**Date:** 2026-04-02
+**Status:** Approved
+**Target:** GPT-5.4, Gemini 3/3.1 Pro (Claude not in scope)
+**Code budget:** ~10K lines total (currently ~2.2K)
+**Constraint:** PR-friendly, mergeable back to nano-claude-code upstream
+
+---
+
+## 1. Overview
+
+Evolve nano-claude-code from a minimal ~2.2K-line reference implementation into a capable AI coding CLI, approaching Claude Code's core functionality while staying lean. Five enhancement areas:
+
+1. **Context Window Management** (`compaction.py`)
+2. **Tool System Enhancement** (`tool_registry.py` + `tools.py` refactor)
+3. **Sub-Agent** (`subagent.py`)
+4. **Memory System** (`memory.py`)
+5. **Skills System** (`skills.py`)
+
+### Strategy
+
+**Approach A: Layered Enhancement** -- add new modules alongside existing files, minimize changes to existing code. When agent.py grows too complex, refactor into Approach B (package structure under `ncc/`).
+
+### Design Principles
+
+- Modules communicate via function parameters / dataclasses, no globals
+- Each new module exposes 2-3 public functions, internals self-contained
+- New logic in agent.py grouped by clear `# --- section ---` comments
+- All code in English (comments, docstrings, commit messages)
+
+---
+
+## 2. File Structure
+
+```
+nano-claude-code/
+├── nano_claude.py      # REPL -- add /memory, /skill slash commands
+├── agent.py            # Agent loop -- add compaction call + sub-agent dispatch
+├── providers.py        # No changes (already solid)
+├── tools.py            # Refactor: register built-in tools via registry
+├── context.py          # Extend: inject memory context
+├── config.py           # Add new config keys
+│
+├── compaction.py       # NEW: Context window management
+├── subagent.py         # NEW: Sub-agent lifecycle
+├── memory.py           # NEW: File-based memory system
+├── skills.py           # NEW: Skill loading and execution
+└── tool_registry.py    # NEW: Tool plugin registry
+```
+
+### Module Dependency Graph (unidirectional)
+
+```
+nano_claude.py
+    ├-> agent.py
+    │    ├-> providers.py
+    │    ├-> tool_registry.py -> tools.py (built-in implementations)
+    │    ├-> compaction.py -> providers.py (for summary model call)
+    │    └-> subagent.py (calls agent.py:run recursively)
+    ├-> context.py -> memory.py
+    ├-> skills.py -> tool_registry.py
+    └-> config.py
+```
+
+---
+
+## 3. Context Window Management (`compaction.py`)
+
+Two-layer compression, inspired by Claude Code's three-layer strategy (Layer 3 contextCollapse is experimental, deferred).
+
+### 3.1 Layer 1: Auto-Compact (model-driven summary)
+
+Triggered when estimated token count exceeds 70% of model's context limit.
+
+```python
+def compact_messages(messages: list[dict], config: dict) -> list[dict]:
+    """
+    Split messages into [old | recent].
+    Summarize old via model call.
+    Return [summary_msg, ack_msg, *recent].
+    """
+    split_point = find_split_point(messages, keep_ratio=0.3)
+    old = messages[:split_point]
+    recent = messages[split_point:]
+    summary = call_model_for_summary(old, config)
+    return [
+        {"role": "user", "content": f"[Conversation summary]\n{summary}"},
+        {"role": "assistant", "content": "Understood, I have the context."},
+        *recent
+    ]
+```
+
+### 3.2 Layer 2: Tool-Result Snipping (rule-based)
+
+Truncate old tool outputs without model call. Fast and cheap.
+
+```python
+def snip_old_tool_results(messages: list[dict], max_chars: int = 2000) -> list[dict]:
+    """
+    For tool results older than N turns, truncate to max_chars.
+    Preserve first/last lines, add [snipped N chars] marker.
+    """
+```
+
+### 3.3 Token Estimation
+
+```python
+def estimate_tokens(messages: list[dict]) -> int:
+    """Use tiktoken for GPT models, chars/3.5 fallback."""
+
+def get_context_limit(model: str) -> int:
+    """Return context window size from provider registry."""
+```
+
+### 3.4 Integration Point
+
+```python
+# In agent.py run() loop, before each API call:
+def _maybe_compact(state: AgentState, config: dict) -> bool:
+    token_count = estimate_tokens(state.messages)
+    threshold = get_context_limit(config["model"]) * 0.7
+    if token_count > threshold:
+        state.messages = compact_messages(state.messages, config)
+        return True
+    return False
+```
+
+### 3.5 Public API
+
+```python
+maybe_compact(state: AgentState, config: dict) -> bool
+estimate_tokens(messages: list[dict]) -> int
+get_context_limit(model: str) -> int
+```
+
+---
+
+## 4. Tool System Enhancement (`tool_registry.py` + `tools.py`)
+
+### 4.1 Tool Registry
+
+```python
+@dataclass
+class ToolDef:
+    name: str
+    schema: dict            # JSON schema for parameters
+    func: Callable          # (params: dict, config: dict) -> str
+    read_only: bool         # True = auto-approve in 'auto' mode
+    concurrent_safe: bool   # True = safe for parallel sub-agent use
+
+_TOOLS: dict[str, ToolDef] = {}
+
+def register_tool(tool_def: ToolDef) -> None
+def get_tool(name: str) -> ToolDef | None
+def get_all_tools() -> list[ToolDef]
+def get_tool_schemas() -> list[dict]
+def execute_tool(name: str, params: dict, config: dict) -> str
+```
+
+### 4.2 Tool Output Truncation
+
+Prevent oversized tool outputs (e.g., `cat` large file, `ls -R`) from blowing up context
+before compaction even gets a chance to run. Applied at the `execute_tool` boundary:
+
+```python
+MAX_TOOL_OUTPUT = 32_000  # ~8K tokens, configurable per tool
+
+def execute_tool(name, params, config):
+    tool = get_tool(name)
+    result = tool.func(params, config)
+
+    # Immediate truncation at source
+    if len(result) > MAX_TOOL_OUTPUT:
+        head = result[:MAX_TOOL_OUTPUT // 2]
+        tail = result[-MAX_TOOL_OUTPUT // 4:]
+        snipped = len(result) - len(head) - len(tail)
+        result = f"{head}\n\n[... {snipped} chars truncated ...]\n\n{tail}"
+
+    return result
+```
+
+Additionally, `Bash` tool caps `subprocess` stdout reads to prevent unbounded
+output (e.g., `cat /dev/urandom`).
+
+This creates a two-layer defense:
+- **Layer 0 (here):** hard truncation at tool execution time — prevents oversized messages
+- **Layer 2 (compaction.py snip):** soft truncation of old tool results — reclaims context space
+
+### 4.3 Built-in Tools Refactor
+
+Existing tools.py implementations unchanged. Wrap each with `register_tool()` at module load:
+
+```python
+register_tool(ToolDef(
+    name="Read", schema=READ_SCHEMA, func=_read_file,
+    read_only=True, concurrent_safe=True
+))
+```
+
+### 4.3 Permission Logic (unified)
+
+```python
+# agent.py
+def _check_permission(tool_name, params, config):
+    tool = get_tool(tool_name)
+    if config["permission_mode"] == "accept-all":
+        return True
+    if tool.read_only:
+        return True
+    if tool_name == "Bash" and _is_safe_command(params["command"]):
+        return True
+    return None  # ask user
+```
+
+---
+
+## 5. Sub-Agent (`subagent.py`)
+
+### 5.1 Data Model
+
+```python
+@dataclass
+class SubAgentTask:
+    id: str
+    prompt: str
+    status: str              # "pending" | "running" | "completed" | "failed" | "cancelled"
+    messages: list[dict]     # independent message history
+    result: str | None
+    model: str | None        # optional model override
+    depth: int = 0           # recursion depth counter
+    _cancel_flag: bool = False
+    _future: Future | None = None
+
+@dataclass
+class SubAgentManager:
+    tasks: dict[str, SubAgentTask] = field(default_factory=dict)
+    max_concurrent: int = 3
+    max_depth: int = 3
+    _pool: ThreadPoolExecutor = field(default_factory=
+        lambda: ThreadPoolExecutor(max_workers=3))
+
+    def spawn(self, prompt, config, system_prompt, depth=0) -> SubAgentTask
+    def get_result(self, task_id) -> str | None
+    def list_tasks(self) -> list[SubAgentTask]
+    def cancel(self, task_id) -> bool
+    def wait(self, task_id, timeout=None) -> SubAgentTask
+```
+
+### 5.2 Execution Model — Threading from Day 1
+
+Sub-agents run in background threads via `ThreadPoolExecutor`. This enables:
+- Non-blocking spawn (main agent continues or waits by choice)
+- Cancellation via cooperative flag
+- Concurrent sub-agents (up to `max_concurrent`)
+
+```python
+def spawn(self, prompt, config, system_prompt, depth=0):
+    if depth >= self.max_depth:
+        return SubAgentTask(status="failed",
+            result="Error: max sub-agent depth reached.")
+
+    task = SubAgentTask(id=uuid4().hex[:8], prompt=prompt,
+                        status="running", depth=depth, ...)
+
+    def _run():
+        sub_state = AgentState()
+        try:
+            for event in agent.run(
+                prompt, sub_state, config, system_prompt,
+                depth=depth + 1,
+                cancel_check=lambda: task._cancel_flag
+            ):
+                if isinstance(event, TurnDone):
+                    task.result = extract_final_text(sub_state.messages)
+            task.status = "completed"
+        except Exception as e:
+            task.result = f"Error: {e}"
+            task.status = "failed"
+
+    task._future = self._pool.submit(_run)
+    self.tasks[task.id] = task
+    return task
+```
+
+### 5.3 Cooperative Cancellation
+
+Python threads cannot be killed safely. Instead, `agent.run()` checks a
+`cancel_check` callable each loop iteration:
+
+```python
+# agent.py run() — new parameter
+def run(user_message, state, config, system_prompt,
+        depth=0, cancel_check=None):
+    ...
+    while True:
+        if cancel_check and cancel_check():
+            return  # clean exit
+        for event in stream(...):
+            yield event
+        ...
+```
+
+### 5.4 Depth Limiting (No Tool Removal)
+
+Sub-agents CAN call Agent tool (enabling A -> B -> C chains). Depth is
+passed through, and the Agent tool returns an error at `max_depth`:
+
+```python
+def _agent_tool_func(params, config, depth=0):
+    if depth >= manager.max_depth:
+        return ("Error: max sub-agent depth reached. "
+                "Complete this task directly without spawning sub-agents.")
+    return manager.spawn(params["prompt"], config, system_prompt, depth)
+```
+
+The model sees the error and adapts — no silent capability removal.
+
+### 5.5 Context Strategy
+
+Sub-agent gets **fresh context** (no parent message history):
+
+```python
+sub_system_prompt = f"""You are a sub-agent. Your task:
+{prompt}
+
+Working directory: {cwd}
+{memory_context}
+"""
+```
+
+### 5.6 Tool Registration — 3 Tools
+
+The sub-agent system registers three tools:
+
+**Agent** — spawn a sub-agent:
+
+```python
+AGENT_SCHEMA = {
+    "name": "Agent",
+    "description": "Launch a sub-agent to handle a task independently.",
+    "input_schema": {
+        "type": "object",
+        "properties": {
+            "prompt": {"type": "string", "description": "Task description"},
+            "model": {"type": "string", "description": "Optional model override"},
+            "wait": {"type": "boolean", "default": True,
+                     "description": "True = block until done (default). "
+                                    "False = return task_id immediately."}
+        },
+        "required": ["prompt"]
+    }
+}
+```
+
+- `wait=True` (default): spawn + block + return result. Feels synchronous to model.
+- `wait=False`: spawn + return task_id immediately. Model must use CheckAgentResult later.
+
+**CheckAgentResult** — poll a background sub-agent:
+
+```python
+CHECK_AGENT_RESULT_SCHEMA = {
+    "name": "CheckAgentResult",
+    "description": "Check the result of a background sub-agent task.",
+    "input_schema": {
+        "type": "object",
+        "properties": {
+            "task_id": {"type": "string", "description": "Task ID from Agent tool"}
+        },
+        "required": ["task_id"]
+    }
+}
+```
+
+Returns: status + result (if completed), or status + "still running".
+
+**ListAgentTasks** — overview of all sub-agents:
+
+```python
+LIST_AGENT_TASKS_SCHEMA = {
+    "name": "ListAgentTasks",
+    "description": "List all sub-agent tasks and their status.",
+    "input_schema": {"type": "object", "properties": {}}
+}
+```
+
+Returns a table of `[id, status, prompt_preview]` for all tasks.
+
+---
+
+## 6. Memory System (`memory.py`)
+
+### 6.1 Storage
+
+```
+~/.nano_claude/memory/
+├── MEMORY.md              # Index file (max 200 lines)
+├── user_role.md           # Individual memory files
+├── feedback_testing.md
+└── ...
+```
+
+Memory file format:
+
+```markdown
+---
+name: user role
+description: user is a data scientist focused on logging
+type: user
+created: 2026-04-02
+---
+
+User is a data scientist, currently investigating observability/logging.
+```
+
+### 6.2 Public API
+
+```python
+@dataclass
+class MemoryEntry:
+    name: str
+    description: str
+    type: str              # "user" | "feedback" | "project" | "reference"
+    content: str
+    file_path: str
+    created: str
+
+def load_index() -> list[MemoryEntry]
+def save_memory(entry: MemoryEntry) -> None
+def delete_memory(name: str) -> None
+def search_memory(query: str) -> list[MemoryEntry]
+def get_memory_context() -> str   # for system prompt injection
+```
+
+### 6.3 Tool Registration
+
+Two tools for model-driven memory management:
+
+- **MemorySave**: `{name, type, description, content}` -> write file + update index
+- **MemoryDelete**: `{name}` -> remove file + update index
+
+### 6.4 Context Integration
+
+`context.py:build_system_prompt()` appends `memory.get_memory_context()` (the MEMORY.md index). Model uses Read tool to access full memory file content when needed.
+
+---
+
+## 7. Skills System (`skills.py`)
+
+### 7.1 Skill Definition
+
+Markdown files with frontmatter:
+
+```
+~/.nano_claude/skills/commit.md
+```
+
+```markdown
+---
+name: commit
+description: Create a git commit with conventional format
+triggers: ["/commit", "commit changes"]
+tools: [Bash, Read]
+---
+
+# Commit Skill
+
+Analyze staged changes and create a well-formatted commit message.
+...
+```
+
+### 7.2 Search Path
+
+```python
+SKILL_PATHS = [
+    Path.cwd() / ".nano_claude" / "skills",    # project-level (priority)
+    Path.home() / ".nano_claude" / "skills",    # user-level
+]
+```
+
+### 7.3 Public API
+
+```python
+@dataclass
+class SkillDef:
+    name: str
+    description: str
+    triggers: list[str]
+    tools: list[str]
+    prompt: str
+    file_path: str
+
+def load_skills() -> list[SkillDef]
+def find_skill(query: str) -> SkillDef | None
+def execute_skill(skill, args, state, config) -> Generator
+```
+
+### 7.4 Execution Model
+
+Skills are just prompts injected into the normal agent loop:
+
+```python
+def execute_skill(skill, args, state, config):
+    prompt = f"[Skill: {skill.name}]\n\n{skill.prompt}"
+    if args:
+        prompt += f"\n\nUser context: {args}"
+    system_prompt = build_system_prompt(config)
+    for event in agent.run(prompt, state, config, system_prompt):
+        yield event
+```
+
+### 7.5 REPL Integration
+
+In `nano_claude.py`, unmatched `/` commands fall through to skill lookup:
+
+```python
+if user_input.startswith("/"):
+    # Try built-in slash commands first
+    # If no match -> find_skill(user_input)
+    # If skill found -> execute_skill(...)
+```
+
+---
+
+## 8. Diff View for File Modifications
+
+Core UX improvement: show git-style red/green diff when Edit or Write modifies an existing file.
+
+### 8.1 Diff Generation (in tools.py)
+
+Edit and Write tool implementations capture before/after content and generate unified diff:
+
+```python
+import difflib
+
+def generate_unified_diff(old, new, filename, context_lines=3):
+    """
+    Args:
+        old: original file content, str
+        new: modified file content, str
+        filename: display name, str
+        context_lines: lines of context around changes, int
+    Returns:
+        unified diff string
+    """
+    old_lines = old.splitlines(keepends=True)
+    new_lines = new.splitlines(keepends=True)
+    diff = difflib.unified_diff(
+        old_lines, new_lines,
+        fromfile=f"a/{filename}", tofile=f"b/{filename}",
+        n=context_lines
+    )
+    return "".join(diff)
+```
+
+Tool return values change:
+- **Edit**: `"Changes applied to {filename}:\n\n{diff}"`
+- **Write** (existing file): `"File updated:\n\n{diff}"`
+- **Write** (new file): `"New file created: {filename} ({n} lines)"` (no diff)
+
+### 8.2 REPL Rendering (in nano_claude.py)
+
+Detect diff blocks in tool output and render with ANSI colors:
+
+```python
+def render_diff(diff_text):
+    for line in diff_text.splitlines():
+        if line.startswith("+++") or line.startswith("---"):
+            print(f"\033[1m{line}\033[0m")        # bold
+        elif line.startswith("+"):
+            print(f"\033[32m{line}\033[0m")        # green
+        elif line.startswith("-"):
+            print(f"\033[31m{line}\033[0m")        # red
+        elif line.startswith("@@"):
+            print(f"\033[36m{line}\033[0m")        # cyan
+        else:
+            print(line)
+```
+
+### 8.3 Diff Truncation
+
+For large diffs (e.g., Write replaces entire file), cap the diff display:
+
+```python
+MAX_DIFF_LINES = 80
+
+def maybe_truncate_diff(diff_text):
+    lines = diff_text.splitlines()
+    if len(lines) > MAX_DIFF_LINES:
+        shown = lines[:MAX_DIFF_LINES]
+        remaining = len(lines) - MAX_DIFF_LINES
+        return "\n".join(shown) + f"\n\n[... {remaining} more lines ...]"
+    return diff_text
+```
+
+Note: truncation applies to the **display** in REPL only. The full diff is still
+returned to the model so it can verify the change.
+
+---
+
+## 9. Implementation Order
+
+Each step is an independent PR:
+
+| Phase | Module | Depends On | Estimated Lines |
+|-------|--------|-----------|-----------------|
+| 1 | `tool_registry.py` + `tools.py` refactor | None | ~600 |
+| 2 | Diff view in `tools.py` + `nano_claude.py` | Phase 1 | ~100 |
+| 3 | `compaction.py` + agent.py integration | Phase 1 | ~300 |
+| 4 | `memory.py` + context.py integration | Phase 1 | ~200 |
+| 5 | `subagent.py` + agent.py integration (threading) | Phase 1 | ~350 |
+| 6 | `skills.py` + nano_claude.py integration | Phase 1, 4 | ~200 |
+| 7 | Slash commands + config updates | All above | ~300 |
+
+**Total new code: ~2050 lines. Grand total: ~4.2K lines.**
+
+---
+
+## 10. Key Decisions
+
+| Decision | Choice | Rationale |
+|----------|--------|-----------|
+| Compression layers | 2 (autoCompact + snip) | Layer 3 is experimental in Claude Code |
+| Tool output truncation | Hard cap at execute_tool boundary | Prevents oversized outputs before compaction runs |
+| Sub-agent execution | Threading from day 1 | Sync blocks main agent, can't cancel, can't parallelize |
+| Sub-agent depth | Depth counter (max 3), no tool removal | Model sees error and adapts; sub-sub-agents allowed |
+| Sub-agent tools | Agent + CheckAgentResult + ListAgentTasks | Model needs feedback loop for async tasks |
+| Diff view | difflib unified diff + ANSI colors | Core UX, zero dependencies |
+| Memory search | Keyword match, no embeddings | Keep simple, model judges relevance |
+| Skills format | Markdown + frontmatter | Human-readable, git-friendly, no Python needed |
+| Tool registry | Global dict + register function | Simple, extensible, easy to migrate to package |
+| Target models | GPT-5.4, Gemini 3/3.1 Pro | User's primary use case |
+| No Claude support | Intentional | Official Claude Code exists |
+
+---
+
+## 11. Future Considerations (Not in Scope)
+
+- MCP protocol support
+- Remote skill marketplace
+- Voice mode
+- Bridge to desktop apps
+- contextCollapse (Layer 3 compression)