- README.MD: add original-source-code and nano-claude-code sections, update overview table (4 subprojects), add v3.0 news entry, expand comparison table with memory/multi-agent/skills dimensions - nano-claude-code v3.0: multi-agent package (multi_agent/), memory package (memory/), skill package (skill/) with built-in /commit and /review skills, context compression (compaction.py), tool registry plugin system, diff view, 17 slash commands, 18 built-in tools, 101 tests (~5000 lines total) - original-source-code/src: add raw TypeScript source tree (1884 files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
19 KiB
Open-CC: Nano Claude Code Enhancement Design
Date: 2026-04-02 Status: Approved Target: GPT-5.4, Gemini 3/3.1 Pro (Claude not in scope) Code budget: ~10K lines total (currently ~2.2K) Constraint: PR-friendly, mergeable back to nano-claude-code upstream
1. Overview
Evolve nano-claude-code from a minimal ~2.2K-line reference implementation into a capable AI coding CLI, approaching Claude Code's core functionality while staying lean. Five enhancement areas:
- Context Window Management (
compaction.py) - Tool System Enhancement (
tool_registry.py+tools.pyrefactor) - Sub-Agent (
subagent.py) - Memory System (
memory.py) - Skills System (
skills.py)
Strategy
Approach A: Layered Enhancement -- add new modules alongside existing files, minimize changes to existing code. When agent.py grows too complex, refactor into Approach B (package structure under ncc/).
Design Principles
- Modules communicate via function parameters / dataclasses, no globals
- Each new module exposes 2-3 public functions, internals self-contained
- New logic in agent.py grouped by clear
# --- section ---comments - All code in English (comments, docstrings, commit messages)
2. File Structure
nano-claude-code/
├── nano_claude.py # REPL -- add /memory, /skill slash commands
├── agent.py # Agent loop -- add compaction call + sub-agent dispatch
├── providers.py # No changes (already solid)
├── tools.py # Refactor: register built-in tools via registry
├── context.py # Extend: inject memory context
├── config.py # Add new config keys
│
├── compaction.py # NEW: Context window management
├── subagent.py # NEW: Sub-agent lifecycle
├── memory.py # NEW: File-based memory system
├── skills.py # NEW: Skill loading and execution
└── tool_registry.py # NEW: Tool plugin registry
Module Dependency Graph (unidirectional)
nano_claude.py
├-> agent.py
│ ├-> providers.py
│ ├-> tool_registry.py -> tools.py (built-in implementations)
│ ├-> compaction.py -> providers.py (for summary model call)
│ └-> subagent.py (calls agent.py:run recursively)
├-> context.py -> memory.py
├-> skills.py -> tool_registry.py
└-> config.py
3. Context Window Management (compaction.py)
Two-layer compression, inspired by Claude Code's three-layer strategy (Layer 3 contextCollapse is experimental, deferred).
3.1 Layer 1: Auto-Compact (model-driven summary)
Triggered when estimated token count exceeds 70% of model's context limit.
def compact_messages(messages: list[dict], config: dict) -> list[dict]:
"""
Split messages into [old | recent].
Summarize old via model call.
Return [summary_msg, ack_msg, *recent].
"""
split_point = find_split_point(messages, keep_ratio=0.3)
old = messages[:split_point]
recent = messages[split_point:]
summary = call_model_for_summary(old, config)
return [
{"role": "user", "content": f"[Conversation summary]\n{summary}"},
{"role": "assistant", "content": "Understood, I have the context."},
*recent
]
3.2 Layer 2: Tool-Result Snipping (rule-based)
Truncate old tool outputs without model call. Fast and cheap.
def snip_old_tool_results(messages: list[dict], max_chars: int = 2000) -> list[dict]:
"""
For tool results older than N turns, truncate to max_chars.
Preserve first/last lines, add [snipped N chars] marker.
"""
3.3 Token Estimation
def estimate_tokens(messages: list[dict]) -> int:
"""Use tiktoken for GPT models, chars/3.5 fallback."""
def get_context_limit(model: str) -> int:
"""Return context window size from provider registry."""
3.4 Integration Point
# In agent.py run() loop, before each API call:
def _maybe_compact(state: AgentState, config: dict) -> bool:
token_count = estimate_tokens(state.messages)
threshold = get_context_limit(config["model"]) * 0.7
if token_count > threshold:
state.messages = compact_messages(state.messages, config)
return True
return False
3.5 Public API
maybe_compact(state: AgentState, config: dict) -> bool
estimate_tokens(messages: list[dict]) -> int
get_context_limit(model: str) -> int
4. Tool System Enhancement (tool_registry.py + tools.py)
4.1 Tool Registry
@dataclass
class ToolDef:
name: str
schema: dict # JSON schema for parameters
func: Callable # (params: dict, config: dict) -> str
read_only: bool # True = auto-approve in 'auto' mode
concurrent_safe: bool # True = safe for parallel sub-agent use
_TOOLS: dict[str, ToolDef] = {}
def register_tool(tool_def: ToolDef) -> None
def get_tool(name: str) -> ToolDef | None
def get_all_tools() -> list[ToolDef]
def get_tool_schemas() -> list[dict]
def execute_tool(name: str, params: dict, config: dict) -> str
4.2 Tool Output Truncation
Prevent oversized tool outputs (e.g., cat large file, ls -R) from blowing up context
before compaction even gets a chance to run. Applied at the execute_tool boundary:
MAX_TOOL_OUTPUT = 32_000 # ~8K tokens, configurable per tool
def execute_tool(name, params, config):
tool = get_tool(name)
result = tool.func(params, config)
# Immediate truncation at source
if len(result) > MAX_TOOL_OUTPUT:
head = result[:MAX_TOOL_OUTPUT // 2]
tail = result[-MAX_TOOL_OUTPUT // 4:]
snipped = len(result) - len(head) - len(tail)
result = f"{head}\n\n[... {snipped} chars truncated ...]\n\n{tail}"
return result
Additionally, Bash tool caps subprocess stdout reads to prevent unbounded
output (e.g., cat /dev/urandom).
This creates a two-layer defense:
- Layer 0 (here): hard truncation at tool execution time — prevents oversized messages
- Layer 2 (compaction.py snip): soft truncation of old tool results — reclaims context space
4.3 Built-in Tools Refactor
Existing tools.py implementations unchanged. Wrap each with register_tool() at module load:
register_tool(ToolDef(
name="Read", schema=READ_SCHEMA, func=_read_file,
read_only=True, concurrent_safe=True
))
4.3 Permission Logic (unified)
# agent.py
def _check_permission(tool_name, params, config):
tool = get_tool(tool_name)
if config["permission_mode"] == "accept-all":
return True
if tool.read_only:
return True
if tool_name == "Bash" and _is_safe_command(params["command"]):
return True
return None # ask user
5. Sub-Agent (subagent.py)
5.1 Data Model
@dataclass
class SubAgentTask:
id: str
prompt: str
status: str # "pending" | "running" | "completed" | "failed" | "cancelled"
messages: list[dict] # independent message history
result: str | None
model: str | None # optional model override
depth: int = 0 # recursion depth counter
_cancel_flag: bool = False
_future: Future | None = None
@dataclass
class SubAgentManager:
tasks: dict[str, SubAgentTask] = field(default_factory=dict)
max_concurrent: int = 3
max_depth: int = 3
_pool: ThreadPoolExecutor = field(default_factory=
lambda: ThreadPoolExecutor(max_workers=3))
def spawn(self, prompt, config, system_prompt, depth=0) -> SubAgentTask
def get_result(self, task_id) -> str | None
def list_tasks(self) -> list[SubAgentTask]
def cancel(self, task_id) -> bool
def wait(self, task_id, timeout=None) -> SubAgentTask
5.2 Execution Model — Threading from Day 1
Sub-agents run in background threads via ThreadPoolExecutor. This enables:
- Non-blocking spawn (main agent continues or waits by choice)
- Cancellation via cooperative flag
- Concurrent sub-agents (up to
max_concurrent)
def spawn(self, prompt, config, system_prompt, depth=0):
if depth >= self.max_depth:
return SubAgentTask(status="failed",
result="Error: max sub-agent depth reached.")
task = SubAgentTask(id=uuid4().hex[:8], prompt=prompt,
status="running", depth=depth, ...)
def _run():
sub_state = AgentState()
try:
for event in agent.run(
prompt, sub_state, config, system_prompt,
depth=depth + 1,
cancel_check=lambda: task._cancel_flag
):
if isinstance(event, TurnDone):
task.result = extract_final_text(sub_state.messages)
task.status = "completed"
except Exception as e:
task.result = f"Error: {e}"
task.status = "failed"
task._future = self._pool.submit(_run)
self.tasks[task.id] = task
return task
5.3 Cooperative Cancellation
Python threads cannot be killed safely. Instead, agent.run() checks a
cancel_check callable each loop iteration:
# agent.py run() — new parameter
def run(user_message, state, config, system_prompt,
depth=0, cancel_check=None):
...
while True:
if cancel_check and cancel_check():
return # clean exit
for event in stream(...):
yield event
...
5.4 Depth Limiting (No Tool Removal)
Sub-agents CAN call Agent tool (enabling A -> B -> C chains). Depth is
passed through, and the Agent tool returns an error at max_depth:
def _agent_tool_func(params, config, depth=0):
if depth >= manager.max_depth:
return ("Error: max sub-agent depth reached. "
"Complete this task directly without spawning sub-agents.")
return manager.spawn(params["prompt"], config, system_prompt, depth)
The model sees the error and adapts — no silent capability removal.
5.5 Context Strategy
Sub-agent gets fresh context (no parent message history):
sub_system_prompt = f"""You are a sub-agent. Your task:
{prompt}
Working directory: {cwd}
{memory_context}
"""
5.6 Tool Registration — 3 Tools
The sub-agent system registers three tools:
Agent — spawn a sub-agent:
AGENT_SCHEMA = {
"name": "Agent",
"description": "Launch a sub-agent to handle a task independently.",
"input_schema": {
"type": "object",
"properties": {
"prompt": {"type": "string", "description": "Task description"},
"model": {"type": "string", "description": "Optional model override"},
"wait": {"type": "boolean", "default": True,
"description": "True = block until done (default). "
"False = return task_id immediately."}
},
"required": ["prompt"]
}
}
wait=True(default): spawn + block + return result. Feels synchronous to model.wait=False: spawn + return task_id immediately. Model must use CheckAgentResult later.
CheckAgentResult — poll a background sub-agent:
CHECK_AGENT_RESULT_SCHEMA = {
"name": "CheckAgentResult",
"description": "Check the result of a background sub-agent task.",
"input_schema": {
"type": "object",
"properties": {
"task_id": {"type": "string", "description": "Task ID from Agent tool"}
},
"required": ["task_id"]
}
}
Returns: status + result (if completed), or status + "still running".
ListAgentTasks — overview of all sub-agents:
LIST_AGENT_TASKS_SCHEMA = {
"name": "ListAgentTasks",
"description": "List all sub-agent tasks and their status.",
"input_schema": {"type": "object", "properties": {}}
}
Returns a table of [id, status, prompt_preview] for all tasks.
6. Memory System (memory.py)
6.1 Storage
~/.nano_claude/memory/
├── MEMORY.md # Index file (max 200 lines)
├── user_role.md # Individual memory files
├── feedback_testing.md
└── ...
Memory file format:
---
name: user role
description: user is a data scientist focused on logging
type: user
created: 2026-04-02
---
User is a data scientist, currently investigating observability/logging.
6.2 Public API
@dataclass
class MemoryEntry:
name: str
description: str
type: str # "user" | "feedback" | "project" | "reference"
content: str
file_path: str
created: str
def load_index() -> list[MemoryEntry]
def save_memory(entry: MemoryEntry) -> None
def delete_memory(name: str) -> None
def search_memory(query: str) -> list[MemoryEntry]
def get_memory_context() -> str # for system prompt injection
6.3 Tool Registration
Two tools for model-driven memory management:
- MemorySave:
{name, type, description, content}-> write file + update index - MemoryDelete:
{name}-> remove file + update index
6.4 Context Integration
context.py:build_system_prompt() appends memory.get_memory_context() (the MEMORY.md index). Model uses Read tool to access full memory file content when needed.
7. Skills System (skills.py)
7.1 Skill Definition
Markdown files with frontmatter:
~/.nano_claude/skills/commit.md
---
name: commit
description: Create a git commit with conventional format
triggers: ["/commit", "commit changes"]
tools: [Bash, Read]
---
# Commit Skill
Analyze staged changes and create a well-formatted commit message.
...
7.2 Search Path
SKILL_PATHS = [
Path.cwd() / ".nano_claude" / "skills", # project-level (priority)
Path.home() / ".nano_claude" / "skills", # user-level
]
7.3 Public API
@dataclass
class SkillDef:
name: str
description: str
triggers: list[str]
tools: list[str]
prompt: str
file_path: str
def load_skills() -> list[SkillDef]
def find_skill(query: str) -> SkillDef | None
def execute_skill(skill, args, state, config) -> Generator
7.4 Execution Model
Skills are just prompts injected into the normal agent loop:
def execute_skill(skill, args, state, config):
prompt = f"[Skill: {skill.name}]\n\n{skill.prompt}"
if args:
prompt += f"\n\nUser context: {args}"
system_prompt = build_system_prompt(config)
for event in agent.run(prompt, state, config, system_prompt):
yield event
7.5 REPL Integration
In nano_claude.py, unmatched / commands fall through to skill lookup:
if user_input.startswith("/"):
# Try built-in slash commands first
# If no match -> find_skill(user_input)
# If skill found -> execute_skill(...)
8. Diff View for File Modifications
Core UX improvement: show git-style red/green diff when Edit or Write modifies an existing file.
8.1 Diff Generation (in tools.py)
Edit and Write tool implementations capture before/after content and generate unified diff:
import difflib
def generate_unified_diff(old, new, filename, context_lines=3):
"""
Args:
old: original file content, str
new: modified file content, str
filename: display name, str
context_lines: lines of context around changes, int
Returns:
unified diff string
"""
old_lines = old.splitlines(keepends=True)
new_lines = new.splitlines(keepends=True)
diff = difflib.unified_diff(
old_lines, new_lines,
fromfile=f"a/{filename}", tofile=f"b/{filename}",
n=context_lines
)
return "".join(diff)
Tool return values change:
- Edit:
"Changes applied to {filename}:\n\n{diff}" - Write (existing file):
"File updated:\n\n{diff}" - Write (new file):
"New file created: {filename} ({n} lines)"(no diff)
8.2 REPL Rendering (in nano_claude.py)
Detect diff blocks in tool output and render with ANSI colors:
def render_diff(diff_text):
for line in diff_text.splitlines():
if line.startswith("+++") or line.startswith("---"):
print(f"\033[1m{line}\033[0m") # bold
elif line.startswith("+"):
print(f"\033[32m{line}\033[0m") # green
elif line.startswith("-"):
print(f"\033[31m{line}\033[0m") # red
elif line.startswith("@@"):
print(f"\033[36m{line}\033[0m") # cyan
else:
print(line)
8.3 Diff Truncation
For large diffs (e.g., Write replaces entire file), cap the diff display:
MAX_DIFF_LINES = 80
def maybe_truncate_diff(diff_text):
lines = diff_text.splitlines()
if len(lines) > MAX_DIFF_LINES:
shown = lines[:MAX_DIFF_LINES]
remaining = len(lines) - MAX_DIFF_LINES
return "\n".join(shown) + f"\n\n[... {remaining} more lines ...]"
return diff_text
Note: truncation applies to the display in REPL only. The full diff is still returned to the model so it can verify the change.
9. Implementation Order
Each step is an independent PR:
| Phase | Module | Depends On | Estimated Lines |
|---|---|---|---|
| 1 | tool_registry.py + tools.py refactor |
None | ~600 |
| 2 | Diff view in tools.py + nano_claude.py |
Phase 1 | ~100 |
| 3 | compaction.py + agent.py integration |
Phase 1 | ~300 |
| 4 | memory.py + context.py integration |
Phase 1 | ~200 |
| 5 | subagent.py + agent.py integration (threading) |
Phase 1 | ~350 |
| 6 | skills.py + nano_claude.py integration |
Phase 1, 4 | ~200 |
| 7 | Slash commands + config updates | All above | ~300 |
Total new code: ~2050 lines. Grand total: ~4.2K lines.
10. Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Compression layers | 2 (autoCompact + snip) | Layer 3 is experimental in Claude Code |
| Tool output truncation | Hard cap at execute_tool boundary | Prevents oversized outputs before compaction runs |
| Sub-agent execution | Threading from day 1 | Sync blocks main agent, can't cancel, can't parallelize |
| Sub-agent depth | Depth counter (max 3), no tool removal | Model sees error and adapts; sub-sub-agents allowed |
| Sub-agent tools | Agent + CheckAgentResult + ListAgentTasks | Model needs feedback loop for async tasks |
| Diff view | difflib unified diff + ANSI colors | Core UX, zero dependencies |
| Memory search | Keyword match, no embeddings | Keep simple, model judges relevance |
| Skills format | Markdown + frontmatter | Human-readable, git-friendly, no Python needed |
| Tool registry | Global dict + register function | Simple, extensible, easy to migrate to package |
| Target models | GPT-5.4, Gemini 3/3.1 Pro | User's primary use case |
| No Claude support | Intentional | Official Claude Code exists |
11. Future Considerations (Not in Scope)
- MCP protocol support
- Remote skill marketplace
- Voice mode
- Bridge to desktop apps
- contextCollapse (Layer 3 compression)