# Open-CC: Nano Claude Code Enhancement Design **Date:** 2026-04-02 **Status:** Approved **Target:** GPT-5.4, Gemini 3/3.1 Pro (Claude not in scope) **Code budget:** ~10K lines total (currently ~2.2K) **Constraint:** PR-friendly, mergeable back to nano-claude-code upstream --- ## 1. Overview Evolve nano-claude-code from a minimal ~2.2K-line reference implementation into a capable AI coding CLI, approaching Claude Code's core functionality while staying lean. Five enhancement areas: 1. **Context Window Management** (`compaction.py`) 2. **Tool System Enhancement** (`tool_registry.py` + `tools.py` refactor) 3. **Sub-Agent** (`subagent.py`) 4. **Memory System** (`memory.py`) 5. **Skills System** (`skills.py`) ### Strategy **Approach A: Layered Enhancement** -- add new modules alongside existing files, minimize changes to existing code. When agent.py grows too complex, refactor into Approach B (package structure under `ncc/`). ### Design Principles - Modules communicate via function parameters / dataclasses, no globals - Each new module exposes 2-3 public functions, internals self-contained - New logic in agent.py grouped by clear `# --- section ---` comments - All code in English (comments, docstrings, commit messages) --- ## 2. File Structure ``` nano-claude-code/ ├── nano_claude.py # REPL -- add /memory, /skill slash commands ├── agent.py # Agent loop -- add compaction call + sub-agent dispatch ├── providers.py # No changes (already solid) ├── tools.py # Refactor: register built-in tools via registry ├── context.py # Extend: inject memory context ├── config.py # Add new config keys │ ├── compaction.py # NEW: Context window management ├── subagent.py # NEW: Sub-agent lifecycle ├── memory.py # NEW: File-based memory system ├── skills.py # NEW: Skill loading and execution └── tool_registry.py # NEW: Tool plugin registry ``` ### Module Dependency Graph (unidirectional) ``` nano_claude.py ├-> agent.py │ ├-> providers.py │ ├-> tool_registry.py -> tools.py (built-in implementations) │ ├-> compaction.py -> providers.py (for summary model call) │ └-> subagent.py (calls agent.py:run recursively) ├-> context.py -> memory.py ├-> skills.py -> tool_registry.py └-> config.py ``` --- ## 3. Context Window Management (`compaction.py`) Two-layer compression, inspired by Claude Code's three-layer strategy (Layer 3 contextCollapse is experimental, deferred). ### 3.1 Layer 1: Auto-Compact (model-driven summary) Triggered when estimated token count exceeds 70% of model's context limit. ```python def compact_messages(messages: list[dict], config: dict) -> list[dict]: """ Split messages into [old | recent]. Summarize old via model call. Return [summary_msg, ack_msg, *recent]. """ split_point = find_split_point(messages, keep_ratio=0.3) old = messages[:split_point] recent = messages[split_point:] summary = call_model_for_summary(old, config) return [ {"role": "user", "content": f"[Conversation summary]\n{summary}"}, {"role": "assistant", "content": "Understood, I have the context."}, *recent ] ``` ### 3.2 Layer 2: Tool-Result Snipping (rule-based) Truncate old tool outputs without model call. Fast and cheap. ```python def snip_old_tool_results(messages: list[dict], max_chars: int = 2000) -> list[dict]: """ For tool results older than N turns, truncate to max_chars. Preserve first/last lines, add [snipped N chars] marker. """ ``` ### 3.3 Token Estimation ```python def estimate_tokens(messages: list[dict]) -> int: """Use tiktoken for GPT models, chars/3.5 fallback.""" def get_context_limit(model: str) -> int: """Return context window size from provider registry.""" ``` ### 3.4 Integration Point ```python # In agent.py run() loop, before each API call: def _maybe_compact(state: AgentState, config: dict) -> bool: token_count = estimate_tokens(state.messages) threshold = get_context_limit(config["model"]) * 0.7 if token_count > threshold: state.messages = compact_messages(state.messages, config) return True return False ``` ### 3.5 Public API ```python maybe_compact(state: AgentState, config: dict) -> bool estimate_tokens(messages: list[dict]) -> int get_context_limit(model: str) -> int ``` --- ## 4. Tool System Enhancement (`tool_registry.py` + `tools.py`) ### 4.1 Tool Registry ```python @dataclass class ToolDef: name: str schema: dict # JSON schema for parameters func: Callable # (params: dict, config: dict) -> str read_only: bool # True = auto-approve in 'auto' mode concurrent_safe: bool # True = safe for parallel sub-agent use _TOOLS: dict[str, ToolDef] = {} def register_tool(tool_def: ToolDef) -> None def get_tool(name: str) -> ToolDef | None def get_all_tools() -> list[ToolDef] def get_tool_schemas() -> list[dict] def execute_tool(name: str, params: dict, config: dict) -> str ``` ### 4.2 Tool Output Truncation Prevent oversized tool outputs (e.g., `cat` large file, `ls -R`) from blowing up context before compaction even gets a chance to run. Applied at the `execute_tool` boundary: ```python MAX_TOOL_OUTPUT = 32_000 # ~8K tokens, configurable per tool def execute_tool(name, params, config): tool = get_tool(name) result = tool.func(params, config) # Immediate truncation at source if len(result) > MAX_TOOL_OUTPUT: head = result[:MAX_TOOL_OUTPUT // 2] tail = result[-MAX_TOOL_OUTPUT // 4:] snipped = len(result) - len(head) - len(tail) result = f"{head}\n\n[... {snipped} chars truncated ...]\n\n{tail}" return result ``` Additionally, `Bash` tool caps `subprocess` stdout reads to prevent unbounded output (e.g., `cat /dev/urandom`). This creates a two-layer defense: - **Layer 0 (here):** hard truncation at tool execution time — prevents oversized messages - **Layer 2 (compaction.py snip):** soft truncation of old tool results — reclaims context space ### 4.3 Built-in Tools Refactor Existing tools.py implementations unchanged. Wrap each with `register_tool()` at module load: ```python register_tool(ToolDef( name="Read", schema=READ_SCHEMA, func=_read_file, read_only=True, concurrent_safe=True )) ``` ### 4.3 Permission Logic (unified) ```python # agent.py def _check_permission(tool_name, params, config): tool = get_tool(tool_name) if config["permission_mode"] == "accept-all": return True if tool.read_only: return True if tool_name == "Bash" and _is_safe_command(params["command"]): return True return None # ask user ``` --- ## 5. Sub-Agent (`subagent.py`) ### 5.1 Data Model ```python @dataclass class SubAgentTask: id: str prompt: str status: str # "pending" | "running" | "completed" | "failed" | "cancelled" messages: list[dict] # independent message history result: str | None model: str | None # optional model override depth: int = 0 # recursion depth counter _cancel_flag: bool = False _future: Future | None = None @dataclass class SubAgentManager: tasks: dict[str, SubAgentTask] = field(default_factory=dict) max_concurrent: int = 3 max_depth: int = 3 _pool: ThreadPoolExecutor = field(default_factory= lambda: ThreadPoolExecutor(max_workers=3)) def spawn(self, prompt, config, system_prompt, depth=0) -> SubAgentTask def get_result(self, task_id) -> str | None def list_tasks(self) -> list[SubAgentTask] def cancel(self, task_id) -> bool def wait(self, task_id, timeout=None) -> SubAgentTask ``` ### 5.2 Execution Model — Threading from Day 1 Sub-agents run in background threads via `ThreadPoolExecutor`. This enables: - Non-blocking spawn (main agent continues or waits by choice) - Cancellation via cooperative flag - Concurrent sub-agents (up to `max_concurrent`) ```python def spawn(self, prompt, config, system_prompt, depth=0): if depth >= self.max_depth: return SubAgentTask(status="failed", result="Error: max sub-agent depth reached.") task = SubAgentTask(id=uuid4().hex[:8], prompt=prompt, status="running", depth=depth, ...) def _run(): sub_state = AgentState() try: for event in agent.run( prompt, sub_state, config, system_prompt, depth=depth + 1, cancel_check=lambda: task._cancel_flag ): if isinstance(event, TurnDone): task.result = extract_final_text(sub_state.messages) task.status = "completed" except Exception as e: task.result = f"Error: {e}" task.status = "failed" task._future = self._pool.submit(_run) self.tasks[task.id] = task return task ``` ### 5.3 Cooperative Cancellation Python threads cannot be killed safely. Instead, `agent.run()` checks a `cancel_check` callable each loop iteration: ```python # agent.py run() — new parameter def run(user_message, state, config, system_prompt, depth=0, cancel_check=None): ... while True: if cancel_check and cancel_check(): return # clean exit for event in stream(...): yield event ... ``` ### 5.4 Depth Limiting (No Tool Removal) Sub-agents CAN call Agent tool (enabling A -> B -> C chains). Depth is passed through, and the Agent tool returns an error at `max_depth`: ```python def _agent_tool_func(params, config, depth=0): if depth >= manager.max_depth: return ("Error: max sub-agent depth reached. " "Complete this task directly without spawning sub-agents.") return manager.spawn(params["prompt"], config, system_prompt, depth) ``` The model sees the error and adapts — no silent capability removal. ### 5.5 Context Strategy Sub-agent gets **fresh context** (no parent message history): ```python sub_system_prompt = f"""You are a sub-agent. Your task: {prompt} Working directory: {cwd} {memory_context} """ ``` ### 5.6 Tool Registration — 3 Tools The sub-agent system registers three tools: **Agent** — spawn a sub-agent: ```python AGENT_SCHEMA = { "name": "Agent", "description": "Launch a sub-agent to handle a task independently.", "input_schema": { "type": "object", "properties": { "prompt": {"type": "string", "description": "Task description"}, "model": {"type": "string", "description": "Optional model override"}, "wait": {"type": "boolean", "default": True, "description": "True = block until done (default). " "False = return task_id immediately."} }, "required": ["prompt"] } } ``` - `wait=True` (default): spawn + block + return result. Feels synchronous to model. - `wait=False`: spawn + return task_id immediately. Model must use CheckAgentResult later. **CheckAgentResult** — poll a background sub-agent: ```python CHECK_AGENT_RESULT_SCHEMA = { "name": "CheckAgentResult", "description": "Check the result of a background sub-agent task.", "input_schema": { "type": "object", "properties": { "task_id": {"type": "string", "description": "Task ID from Agent tool"} }, "required": ["task_id"] } } ``` Returns: status + result (if completed), or status + "still running". **ListAgentTasks** — overview of all sub-agents: ```python LIST_AGENT_TASKS_SCHEMA = { "name": "ListAgentTasks", "description": "List all sub-agent tasks and their status.", "input_schema": {"type": "object", "properties": {}} } ``` Returns a table of `[id, status, prompt_preview]` for all tasks. --- ## 6. Memory System (`memory.py`) ### 6.1 Storage ``` ~/.nano_claude/memory/ ├── MEMORY.md # Index file (max 200 lines) ├── user_role.md # Individual memory files ├── feedback_testing.md └── ... ``` Memory file format: ```markdown --- name: user role description: user is a data scientist focused on logging type: user created: 2026-04-02 --- User is a data scientist, currently investigating observability/logging. ``` ### 6.2 Public API ```python @dataclass class MemoryEntry: name: str description: str type: str # "user" | "feedback" | "project" | "reference" content: str file_path: str created: str def load_index() -> list[MemoryEntry] def save_memory(entry: MemoryEntry) -> None def delete_memory(name: str) -> None def search_memory(query: str) -> list[MemoryEntry] def get_memory_context() -> str # for system prompt injection ``` ### 6.3 Tool Registration Two tools for model-driven memory management: - **MemorySave**: `{name, type, description, content}` -> write file + update index - **MemoryDelete**: `{name}` -> remove file + update index ### 6.4 Context Integration `context.py:build_system_prompt()` appends `memory.get_memory_context()` (the MEMORY.md index). Model uses Read tool to access full memory file content when needed. --- ## 7. Skills System (`skills.py`) ### 7.1 Skill Definition Markdown files with frontmatter: ``` ~/.nano_claude/skills/commit.md ``` ```markdown --- name: commit description: Create a git commit with conventional format triggers: ["/commit", "commit changes"] tools: [Bash, Read] --- # Commit Skill Analyze staged changes and create a well-formatted commit message. ... ``` ### 7.2 Search Path ```python SKILL_PATHS = [ Path.cwd() / ".nano_claude" / "skills", # project-level (priority) Path.home() / ".nano_claude" / "skills", # user-level ] ``` ### 7.3 Public API ```python @dataclass class SkillDef: name: str description: str triggers: list[str] tools: list[str] prompt: str file_path: str def load_skills() -> list[SkillDef] def find_skill(query: str) -> SkillDef | None def execute_skill(skill, args, state, config) -> Generator ``` ### 7.4 Execution Model Skills are just prompts injected into the normal agent loop: ```python def execute_skill(skill, args, state, config): prompt = f"[Skill: {skill.name}]\n\n{skill.prompt}" if args: prompt += f"\n\nUser context: {args}" system_prompt = build_system_prompt(config) for event in agent.run(prompt, state, config, system_prompt): yield event ``` ### 7.5 REPL Integration In `nano_claude.py`, unmatched `/` commands fall through to skill lookup: ```python if user_input.startswith("/"): # Try built-in slash commands first # If no match -> find_skill(user_input) # If skill found -> execute_skill(...) ``` --- ## 8. Diff View for File Modifications Core UX improvement: show git-style red/green diff when Edit or Write modifies an existing file. ### 8.1 Diff Generation (in tools.py) Edit and Write tool implementations capture before/after content and generate unified diff: ```python import difflib def generate_unified_diff(old, new, filename, context_lines=3): """ Args: old: original file content, str new: modified file content, str filename: display name, str context_lines: lines of context around changes, int Returns: unified diff string """ old_lines = old.splitlines(keepends=True) new_lines = new.splitlines(keepends=True) diff = difflib.unified_diff( old_lines, new_lines, fromfile=f"a/{filename}", tofile=f"b/{filename}", n=context_lines ) return "".join(diff) ``` Tool return values change: - **Edit**: `"Changes applied to {filename}:\n\n{diff}"` - **Write** (existing file): `"File updated:\n\n{diff}"` - **Write** (new file): `"New file created: {filename} ({n} lines)"` (no diff) ### 8.2 REPL Rendering (in nano_claude.py) Detect diff blocks in tool output and render with ANSI colors: ```python def render_diff(diff_text): for line in diff_text.splitlines(): if line.startswith("+++") or line.startswith("---"): print(f"\033[1m{line}\033[0m") # bold elif line.startswith("+"): print(f"\033[32m{line}\033[0m") # green elif line.startswith("-"): print(f"\033[31m{line}\033[0m") # red elif line.startswith("@@"): print(f"\033[36m{line}\033[0m") # cyan else: print(line) ``` ### 8.3 Diff Truncation For large diffs (e.g., Write replaces entire file), cap the diff display: ```python MAX_DIFF_LINES = 80 def maybe_truncate_diff(diff_text): lines = diff_text.splitlines() if len(lines) > MAX_DIFF_LINES: shown = lines[:MAX_DIFF_LINES] remaining = len(lines) - MAX_DIFF_LINES return "\n".join(shown) + f"\n\n[... {remaining} more lines ...]" return diff_text ``` Note: truncation applies to the **display** in REPL only. The full diff is still returned to the model so it can verify the change. --- ## 9. Implementation Order Each step is an independent PR: | Phase | Module | Depends On | Estimated Lines | |-------|--------|-----------|-----------------| | 1 | `tool_registry.py` + `tools.py` refactor | None | ~600 | | 2 | Diff view in `tools.py` + `nano_claude.py` | Phase 1 | ~100 | | 3 | `compaction.py` + agent.py integration | Phase 1 | ~300 | | 4 | `memory.py` + context.py integration | Phase 1 | ~200 | | 5 | `subagent.py` + agent.py integration (threading) | Phase 1 | ~350 | | 6 | `skills.py` + nano_claude.py integration | Phase 1, 4 | ~200 | | 7 | Slash commands + config updates | All above | ~300 | **Total new code: ~2050 lines. Grand total: ~4.2K lines.** --- ## 10. Key Decisions | Decision | Choice | Rationale | |----------|--------|-----------| | Compression layers | 2 (autoCompact + snip) | Layer 3 is experimental in Claude Code | | Tool output truncation | Hard cap at execute_tool boundary | Prevents oversized outputs before compaction runs | | Sub-agent execution | Threading from day 1 | Sync blocks main agent, can't cancel, can't parallelize | | Sub-agent depth | Depth counter (max 3), no tool removal | Model sees error and adapts; sub-sub-agents allowed | | Sub-agent tools | Agent + CheckAgentResult + ListAgentTasks | Model needs feedback loop for async tasks | | Diff view | difflib unified diff + ANSI colors | Core UX, zero dependencies | | Memory search | Keyword match, no embeddings | Keep simple, model judges relevance | | Skills format | Markdown + frontmatter | Human-readable, git-friendly, no Python needed | | Tool registry | Global dict + register function | Simple, extensible, easy to migrate to package | | Target models | GPT-5.4, Gemini 3/3.1 Pro | User's primary use case | | No Claude support | Intentional | Official Claude Code exists | --- ## 11. Future Considerations (Not in Scope) - MCP protocol support - Remote skill marketplace - Voice mode - Bridge to desktop apps - contextCollapse (Layer 3 compression)