How Claude Code Works Under the Hood

You type a question. Claude thinks. It reads your files. It edits your code. It runs your tests. It responds with the answer. But between your keystroke and the answer appearing on screen, a remarkable amount of machinery runs. Understanding that machinery is the difference between using Claude Code effectively and just hoping for the best.

This is a technical deep dive into how Claude Code actually works — based on the open-source codebase. We will trace the path of a single message from your terminal through every layer of the system and back.

The Big Picture

Claude Code is not a chatbot with a code editor bolted on. It is a terminal-native agentic system — a program that uses an LLM as its reasoning engine and tools as its actuators. Think of it as a robot that can read, write, and execute code, with an LLM as its brain.

The architecture is surprisingly simple at the highest level:

User Input

CLI Parser

QueryEngine

LLM API

Tool Execution

Terminal UI

Pipeline Status

Click "Send Message" to trace a request through the Claude Code pipeline

The Key Components

CLI Parser (Commander.js + React/Ink): Parses your input, renders the terminal UI
QueryEngine: One instance per conversation. Manages session state, message history, and coordinates everything
Agentic Loop: An infinite while(true) that streams from the API, executes tools, and continues until Claude is done
Tool System: ~40 built-in tools (file read/write, bash, grep, web fetch) plus MCP tools from external servers
Permission System: Allow/deny/ask rules that gate every tool execution
Context Manager: Monitors token usage, compacts conversations when they get too long

You now understand the high-level architecture. Let us look at the heart of it all.

The Agentic Loop: The Infinite While(True)

The most important piece of code in Claude Code is the agentic loop. It is literally an infinite while(true) generator function in src/query.ts. Here is what each iteration does:

Turns0

Tool calls0

1

Send messages to API

→

2

Stream response

→

3

Collect tool_use blocks

→

4

Any tool calls?

→

5

Execute tools

→

6

Append results

→

7

Continue loop

Press Run Loop to start the agentic loop simulation...

Why an Infinite Loop?

The LLM does not plan everything upfront. It reasons step by step. It might read a file, realize it needs to check another file, run a test, see a failure, read the error message, and only then write the fix. Each of these steps is a separate API call, and each call might produce more tool requests.

The loop handles this naturally:

Send conversation to API
API responds with text + tool calls
Execute all tool calls
Append tool results to conversation
Go back to step 1

The loop breaks only when the API responds with no tool calls — meaning Claude has decided it is done.

The State Between Iterations

Each iteration carries a State object with the full message history, turn count, auto-compact tracking, and a transition field that records why the previous iteration continued (tool_use, tool_result, max_tokens, etc.). This metadata is critical for debugging and for the context management system to understand how the conversation is evolving.

You now understand the agentic loop. Let us look at what tells Claude how to behave.

The System Prompt: Claude’s Instructions

Every API call includes a system prompt that tells Claude how to act. Claude Code’s system prompt is built in two parts — static (cacheable, never changes) and dynamic (changes every call based on context).

The Static Prompt (Cached)

The static portion is split at a boundary marker. Everything before it uses Anthropic’s prompt caching with scope: 'global', meaning it is cached once and reused across all conversations. This includes:

Role definition: “You are an interactive agent that helps users with software engineering tasks”
Tool usage guidelines: Prefer dedicated tools over Bash, make parallel tool calls when possible, reference files with file_path:line_number
Coding philosophy: No gold-plating, no premature abstractions, security-first, prefer editing existing files over creating new ones
Tone: Concise, no filler, no emoji, file references in parentheses
Output efficiency: Be brief, avoid preamble and postamble, let code speak for itself

The Dynamic Prompt (Session-Specific)

After the boundary, the prompt includes session-specific information:

Git context: Current branch, status, recent commits, user name
User context: Contents of CLAUDE.md files (project-specific instructions found in the repo)
Environment: Current working directory, OS, shell type, available tools
MCP server instructions: Instructions from connected MCP servers
Memory: Persistent notes from previous conversations
Scratchpad: Temporary working notes from the current session

Why Two Parts?

Caching the static portion saves tokens and latency on every call. The system prompt can be 5000+ tokens. By caching it, Claude Code only pays for those tokens once per conversation, not once per API call. The dynamic portion is small (typically 500-1500 tokens) and changes rarely.

You now understand the system prompt. Let us look at how Claude interacts with the world.

Tools: Claude’s Hands and Eyes

Claude Code gives Claude ~40 built-in tools organized into categories. Each tool is a self-contained TypeScript module with a consistent interface.

Claude Code Tool Registry

14 tools7 read-only7 write

FileReadFile I/O

read-onlyconcurrency-safe

Read file contents at a given path with optional offset/limit

FileWriteFile I/O

writeserial

Write content to a file, creating it if it does not exist

FileEditFile I/O

writeserial

Apply a targeted search-and-replace edit to a specific region of a file

GlobFile I/O

read-onlyconcurrency-safe

Find files matching a glob pattern like "**/*.ts" or "src/**/*.test.*"

GrepFile I/O

read-onlyconcurrency-safe

Search file contents using regex patterns across the project

BashExecution

writeserial

Run a shell command in the project directory and capture output

PowerShellExecution

writeserial

Execute PowerShell commands on Windows environments

WebFetchWeb

read-onlyconcurrency-safe

Fetch and parse content from a URL into text or markdown

WebSearchWeb

read-onlyconcurrency-safe

Perform a web search and return summarized results

AgentAgent

writeconcurrency-safe

Spawn a sub-agent to handle an independent subtask in parallel

SendMessageAgent

writeserial

Send a message to another agent instance or return a final result

AskUserQuestionUtility

read-onlyserial

Pause execution and ask the user a clarifying question

ConfigUtility

read-onlyconcurrency-safe

Read or update workspace configuration values

TodoWriteUtility

writeserial

Maintain a task checklist to track progress on multi-step work

The Tool Interface

Every tool implements the same interface:

name: Unique identifier (e.g., "Bash", "FileReadTool")
inputSchema: Zod schema that validates tool inputs and generates the JSON Schema sent to the API
call(): The actual execution function — runs the tool and returns a result
description(): Human-readable description shown to Claude (this is what Claude uses to decide when to use the tool)
isReadOnly(): Returns true if the tool does not modify anything (e.g., FileRead vs FileWrite)
isConcurrencySafe(): Returns true if multiple instances can run in parallel (e.g., two FileRead calls)
checkPermissions(): Returns allow/deny/ask based on the user’s permission rules

Tool Categories

File I/O: Read, Write, Edit, Glob, Grep — Claude’s primary way to interact with your codebase
Execution: Bash, PowerShell — for running commands, tests, builds
Web: WebFetch, WebSearch — for looking up documentation and APIs
Agent: Agent, SendMessage — for spawning sub-agents and multi-agent coordination
Utility: AskUserQuestion, Config, TodoWrite — for interacting with the user

How Claude Decides Which Tool to Use

Claude receives all tool schemas in the API request (unless deferred tool loading is enabled). The model autonomously decides which tools to call based on the tool descriptions and the conversation context. There is no hardcoded routing — the LLM itself acts as the dispatcher.

For large numbers of MCP tools, deferred tool loading is used: tools are not included in the initial request. Instead, Claude can use ToolSearchTool to discover tools on demand. This prevents hitting tool schema limits.

You now understand the tool system. Let us look at how tool calls are actually executed.

The Tool Use Protocol

When Claude decides to use tools, it outputs tool_use content blocks in its response. Claude Code then executes these tools and feeds the results back.

Execution Order Matters

Not all tools can run at the same time. Claude Code partitions tool calls into batches:

Consecutive read-only tools are batched together and executed in parallel (up to 10 concurrent, configurable via CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY)
Write tools each get their own batch and execute serially

This means if Claude asks to read 5 files and edit 1 file, the 5 reads happen simultaneously, then the edit runs alone. This is the toolOrchestration.ts module’s job.

The Execution Pipeline

Each tool goes through runToolUse():

Find the tool by name in the registry
Validate input against the tool’s Zod schema
Check permissions against the user’s rules
Execute the tool’s call() function
Truncate results if they exceed the per-message budget (large results are saved to disk with a preview)
Return the result as a tool_result content block

Tool Results Feed Back Into the Loop

Tool results are appended to the conversation as user messages with tool_result content blocks. The next API call includes the full conversation history, so Claude sees what every tool returned and decides what to do next.

You now understand the tool use protocol. Let us look at how Claude Code keeps you safe.

The Permission System

Every tool execution goes through the permission system. This is what prevents Claude from accidentally deleting your files, sending your data to the internet, or running destructive commands.

Simulate

Tool Call Received

Check Deny Rules

Check Allow Rules

Check Ask Rules

Run ML Classifier

Default Permission Mode

Select a scenario above to visualize the permission flow

Permission Mode

Read-only ops auto-approved, writes may prompt

Permission Modes

Default: Prompts the user for potentially destructive operations (file writes, bash commands, web fetches)
Plan mode: Shows the full plan before executing, asks for confirmation once
Auto-approve: ML-based classifier decides whether to allow or ask
Bypass: Auto-approves everything (dangerous, but useful for CI/CD)

Permission Rules

Users can define allow/deny/ask rules with wildcard patterns:

Bash(git *) — allow all git commands without asking
FileEdit(/src/*) — allow edits in the src/ directory
mcp__server — allow all tools from a specific MCP server

The permission check runs in this order:

Check deny rules — if matched, immediately reject
Check allow rules — if matched, immediately approve
Check ask rules — if matched, prompt the user
Run pre-tool-use hooks
If auto-mode, run the ML classifier
If no rule matches, defer to the current permission mode

Why This Design?

The permission system treats tool execution as untrusted by default. Even though Claude generated the tool call, the system does not assume the call is safe. This is a defense-in-depth approach: the LLM might hallucinate a destructive command, or a prompt injection in a file might trick Claude into running something malicious.

You now understand the permission system. Let us look at how Claude Code manages its context window.

Context Management: Fitting Everything In

LLMs have a limited context window (typically 200K tokens for Claude). Every message, tool call, and tool result consumes tokens. A long conversation with many file reads and test outputs can easily exhaust this limit. Claude Code uses several strategies to manage this.

Claude Code Context Window (200K tokens)

auto-compact (187K)

Send messages:5

System Prompt

User

Assistant

Tool Call

Tool Result

Used: 5,000
Available: 195,000
Messages: 0
Fill: 2.5%

Auto-Compact

When the conversation approaches the model’s limit (threshold: context_window - 13,000 tokens), Claude Code automatically compacts the conversation:

Sends the full conversation to the API with a compaction prompt
The API returns a summary of the conversation
Old messages are replaced with the summary
Critical context is restored: recent files, plan files, skills, MCP instructions

There is a circuit breaker that stops after 3 consecutive failures (the API might reject the compaction if the conversation is already too large).

Microcompact

Instead of full compaction, Claude Code can ask the API to selectively forget old content blocks using Anthropic’s cache editing API. This preserves the summary while freeing up space for the most recent context.

Tool Result Budget

Each message has a budget on aggregate tool result size. Large file contents are replaced with previews: “Showing first 100 lines of 500” plus a file path. The full content is saved to disk if Claude needs it later.

Why This Matters

Without context management, a long session would simply fail with a “prompt too long” error. With it, Claude Code can run for hours on a single conversation, reading hundreds of files and executing dozens of commands, all while staying within the context window.

You now understand context management. Let us look at how Claude Code handles complex tasks.

Sub-Agents: Parallel Work

Claude Code can spawn sub-agents to work on tasks in parallel. This is how it handles complex, multi-step requests efficiently.

C

Parent Agent

claude-code main session

EXPLORE

Explore Codebase

GENERAL

Create Demo Component

GENERAL

Run Tests

How Sub-Agents Work

When Claude decides a task would benefit from parallel execution, it uses the Agent tool with:

A description (3-5 words)
A prompt (the task for the sub-agent)
A subagent_type (specialized agent type)

The sub-agent gets its own QueryEngine instance, its own conversation history, and its own set of tools. It runs independently and returns the result to the parent agent.

Agent Types

General: Full-featured agent with access to all tools
Explore: Fast agent specialized for searching codebases (read-only, fast)
Plan: Planning agent that creates structured approaches before execution

Background Tasks

Sub-agents can run in the background (run_in_background: true). The parent agent continues working while the sub-agent runs. When the sub-agent completes, a notification appears. This is how Claude Code handles requests like “create a blog post” — it can spawn sub-agents to create each demo component in parallel.

Why Sub-Agents?

Without sub-agents, Claude would have to do everything sequentially in a single conversation. With sub-agents, it can explore multiple files simultaneously, run independent analyses in parallel, and delegate specialized work to focused agents. The result is significantly faster for complex tasks.

You now understand sub-agents. Let us look at how responses actually arrive at your terminal.

How Streaming Works

When Claude Code calls the Anthropic API, it does not wait for the entire response. It streams. This means you see Claude thinking, typing, and calling tools in real time. The streaming system is one of the most carefully engineered parts of Claude Code.

SSE Event Stream0/56 events

Waiting for stream...

Accumulated Blocks

Blocks will appear here

Stream Protocol

message_startTTFB, model info

content_block_startBlock type + index

content_block_deltaIncremental content

content_block_stopBlock finalized

message_deltaToken counts

message_stopStream ended

The SSE Protocol

Claude Code calls the API with stream: true. The API responds with a series of Server-Sent Events (SSE), each carrying a small piece of the response. There are six event types:

message_start — Confirms the connection, includes the model info and TTFB (Time To First Byte)
content_block_start — Begins a new content block (text, tool_use, thinking)
content_block_delta — Carries incremental content (a few characters of text, a fragment of JSON for tool input)
content_block_stop — Marks a content block as complete
message_delta — Includes token usage counts and the stop reason
message_stop — Signals the stream is complete

Raw Stream vs BetaMessageStream

Claude Code intentionally uses the raw stream from the Anthropic SDK rather than the higher-level BetaMessageStream. Why? The SDK’s BetaMessageStream calls partialParse() on every input_json_delta event to try to parse incomplete JSON. This is O(n^2) — each parse re-parses the entire accumulated string. For large tool inputs, this causes noticeable lag.

Instead, Claude Code accumulates tool input as a raw string by appending delta.partial_json to contentBlock.input += delta.partial_json. It only parses the JSON once, when the block is complete at content_block_stop.

Fine-Grained Tool Streaming (FGTS)

By default, the API buffers the entire tool input before sending any input_json_delta events. For large tool inputs (think: a full file edit with hundreds of lines), this means you stare at a blank screen for 30+ seconds while the API generates the input silently.

Claude Code solves this with eager_input_streaming: true, which tells the API to stream tool input character by character as it is generated. This is gated behind the tengu_fgts feature flag and only works with the first-party api.anthropic.com endpoint (proxies like Bedrock and Vertex do not support it).

Streaming Tool Execution

Claude Code goes one step further: it starts executing tools as their content_block_stop events arrive, before the full response stream is complete. If Claude decides to read three files and then edit one, Claude Code starts reading those three files immediately after each content_block_stop, even while the API is still streaming the edit block. Results from completed tools are yielded immediately, interleaving tool execution with the still-streaming response.

The Stream Watchdog

Streams can stall — network issues, API overload, proxy timeouts. Claude Code has a configurable idle watchdog (default 90 seconds). If no stream events arrive within the timeout, it aborts the stream and falls back to a non-streaming request. There is also stall detection at 30 seconds that logs diagnostics without aborting.

You now understand streaming. Let us look at how Claude Code connects to external tools.

MCP: Extending Claude Code

Claude Code’s built-in tools are powerful, but they cannot do everything. You might want Claude to create GitHub issues, query a database, search Slack, or read Figma files. MCP (Model Context Protocol) lets you connect external tool servers that extend Claude’s capabilities.

MCP Servers

GH

github

STDIO

SL

slack

SSE

PG

postgres-prod

HTTP

FG

figma

7 tools from 3 servers

Transports

stdioChild process (stdin/stdout)

sseServer-Sent Events (HTTP)

httpStreamable HTTP

wsWebSocket

Select an MCP server to explore its tools

What Is MCP?

MCP is an open protocol that lets Claude Code act as a client connecting to external tool servers. Each server exposes tools (callable functions), resources (readable data), and prompts (slash commands). Claude Code discovers these tools and includes them in its tool registry, so Claude can use them just like built-in tools.

The tool names are namespaced: mcp__<server_name>__<tool_name>. For example, mcp__github__create_issue or mcp__slack__send_message.

Transport Mechanisms

MCP servers communicate over four transport types:

stdio (default) — Claude Code spawns the server as a child process. Communication happens over stdin/stdout using JSON-RPC. This is the simplest and most common transport.
SSE — Server-Sent Events over HTTP. Used for remote servers. Claude Code connects to a URL and receives a long-lived event stream.
HTTP — Streamable HTTP, the newer MCP transport. Supports both JSON and SSE responses with session-based connections.
WebSocket — Raw WebSocket connections with the mcp subprotocol. Used for real-time bidirectional communication.

There are also internal transports for IDE extensions (sse-ide, ws-ide), the Claude Agent SDK (sdk), and proxied connections through claude.ai (claudeai-proxy).

The Connection Lifecycle

When Claude Code starts, it loads MCP server configurations from multiple sources in priority order:

Enterprise — Managed settings (managed-mcp.json)
Claude.ai — Fetched from the claude.ai API
Plugin — From installed plugins
User — Global config (~/.claude/claude_desktop_config.json)
Project — .mcp.json files, walking from repo root to CWD
Dynamic — Runtime-only (SDK servers, CLI flags)

Servers are connected in parallel (up to 3 local + 20 remote concurrently). After connecting, Claude Code fetches tools, commands, skills, and resources from each server. If a remote server returns HTTP 401, Claude Code creates a pseudo-tool (McpAuthTool) that triggers an OAuth flow when invoked.

How MCP Tools Are Called

When Claude decides to use an MCP tool, the call flows through:

Find the tool in the registry by its mcp__<server>__<tool> name
Verify the MCP connection is still alive (reconnect if needed)
Call client.callTool() on the connected MCP client
Handle URL elicitation (if the server needs user authentication for a specific action)
Process the result — images are resized, binary blobs are saved to disk, large outputs are truncated or persisted

MCP Instructions in the System Prompt

MCP servers can provide instructions that tell Claude how to use their tools. These are injected into the system prompt as a markdown section:

# MCP Server Instructions

## github
When creating issues, always include reproduction steps...

## slack
Messages should be concise and in the channel's language...

These instructions are capped at 2048 characters per server to prevent verbose servers from consuming too much context. Claude Code uses a delta-based approach — it only announces newly connected or disconnected servers, avoiding cache-busting recomputation on every turn.

You now understand MCP. Let us look at plan mode.

Plan Mode: Think Before You Act

Sometimes you do not want Claude to start editing files immediately. You want it to explore the codebase, design an approach, and get your approval before making changes. That is what plan mode does.

How Plan Mode Works

Plan mode is entered via the EnterPlanMode tool (which requires user approval). Once active:

Permission mode changes to 'plan' — Write operations are blocked except to the plan file itself
Claude enters a 5-phase workflow:
- Phase 1 (Explore) — Spawns up to 3 Explore sub-agents in parallel to read the codebase
- Phase 2 (Design) — Spawns Plan sub-agents to design a solution considering trade-offs
- Phase 3 (Review) — Reviews the design for completeness
- Phase 4 (Write) — Writes the final plan to a file in ~/.claude/plans/
- Phase 5 (Exit) — Calls ExitPlanMode to request user approval
The user sees the full plan and can approve, request changes, or reject it

The Plan File

Plans are stored as markdown files in ~/.claude/plans/ with generated slugs (e.g., eloquent-breeze.md). The plan file is the only writable file during plan mode — all other file operations are blocked by the permission system.

When exiting plan mode, the tool reads the plan file from disk and presents it for approval. If approved, Claude restores the previous permission mode and begins implementation.

Plan vs Explore vs General Agents

Each agent type has different capabilities:

Explore — Fast, read-only, uses a cheaper model (Haiku). Good for searching codebases.
Plan — Read-only, can run bash for inspection (ls, git log, grep). Designs solutions.
General — Full access to all tools. The default agent type for most tasks.

Plan and Explore agents intentionally omit CLAUDE.md from their context to save tokens. The main agent already has full context and interprets their output.

You now understand plan mode. Let us look at how Claude Code remembers things across conversations.

CLAUDE.md and Project Memory

Claude Code can follow project-specific instructions that persist across conversations. This is how you teach Claude your coding conventions, preferred tools, and project-specific rules.

Memory Files5/5

Enterprise Policy

/etc/claude-code/CLAUDE.md

User Global

~/.claude/CLAUDE.md

Project Root

~/project/CLAUDE.md

Conditional Rule

~/project/.claude/rules/typescript.md

Local Only

~/project/CLAUDE.local.md

Loading Order

1.Enterprise(lowest)

2.User Global(lowest)

3.Project (team)(lowest)

4.Local (private)(lowest)

local = highest priority

System Prompt Preview

Codebase and user instructions are shown below. Be sure to adhere to these instructions. IMPORTANT: These instructions OVERRIDE any default behavior.
<Enterprise -- /etc/claude-code/CLAUDE.md>
# Enterprise Rules
- Never commit secrets to the repository
- All code must pass CI before merge
- Use approved dependency versions only
<User Global -- ~/.claude/CLAUDE.md>
# My Preferences
- I prefer TypeScript over JavaScript
- Use functional programming style
- Always write tests for new code
<Project (team) -- ~/project/CLAUDE.md>
# Project Conventions
- Use Bun (not npm)
- Components go in src/components/
- Run `bun run build` to verify changes
<Project (team) -- ~/project/.claude/rules/typescript.md>
---
paths:
  - "**/*.ts"
  - "**/*.tsx"
---
# TypeScript Rules
- Use `interface` over `type` for objects
- No `any` types allowed
<Local (private) -- ~/project/CLAUDE.local.md>
# Local Settings
- My dev server runs on port 3001
- Test database is at localhost:5433
Total: 597 chars from 5 files

What Is CLAUDE.md?

CLAUDE.md is a markdown file that contains instructions for Claude Code. You can place it in several locations:

Project root — CLAUDE.md or .claude/CLAUDE.md (checked into the repo, shared with team)
Subdirectories — Loaded from root downward, so closer files override parent ones
User home — ~/.claude/CLAUDE.md (private global instructions)
Local only — CLAUDE.local.md (gitignored, private per-machine settings)
Managed — /etc/claude-code/CLAUDE.md (enterprise policy)
Rules directory — .claude/rules/*.md (conditional rules with glob-based file matching)

How Files Are Discovered

Claude Code walks from the git root down to the current working directory, loading files in priority order (root first, CWD last). It stops at the git root boundary to prevent instructions from parent repos leaking in. It handles nested git repos (submodules) and worktrees correctly.

Files support @include directives for pulling in other files, and rules in .claude/rules/*.md can have frontmatter with paths: glob patterns for conditional loading — the rule only activates when you edit a file matching the glob.

How Instructions Flow Into the System Prompt

All discovered CLAUDE.md files are concatenated and prefixed with: “Codebase and user instructions are shown below. Be sure to adhere to these instructions. IMPORTANT: These instructions OVERRIDE any default behavior.”

Each file is labeled with its scope: “(project instructions, checked into the codebase)” or “(user’s private global instructions for all projects)”. This helps Claude understand which instructions are team-shared and which are personal.

Auto Memory

Beyond CLAUDE.md, Claude Code has an auto-memory system that persists observations across conversations. It stores memories in <project>/.claude/agent-memory/ with an index file (MEMORY.md). Memory types include user preferences, feedback about approach, ongoing work context, and references to external systems.

This is gated behind feature flags and can be disabled via CLAUDE_CODE_DISABLE_AUTO_MEMORY.

The Scratchpad

Claude Code also has a per-session scratchpad directory at /tmp/claude-<uid>/<cwd>/<sessionId>/scratchpad/. Writes here bypass permission checks, making it useful for Claude to jot down temporary notes without prompting you. It is session-scoped and cleaned up automatically.

You now understand project memory. Let us look at how Claude Code navigates your codebase.

How Claude Code Reads Your Codebase

Claude Code has four primary tools for discovering and reading files. Understanding these helps you know how Claude explores your project.

File Discovery: Glob and Grep

Glob finds files by name pattern (e.g., **/*.test.ts). It delegates to ripgrep (rg --files --glob <pattern>), not Node’s filesystem API. Results are capped at 100 files and sorted by modification time (most recently changed files first — a useful heuristic for finding relevant code).

Grep searches file contents using regex. Also powered by ripgrep. It supports content search, file matching, and match counting. VCS directories (.git, .svn) are auto-excluded, and lines are capped at 500 characters to prevent noise from minified files.

Both tools use a vendored ripgrep binary that is statically compiled and bundled with Claude Code. On macOS, the binary is auto-codesigned on first use to avoid quarantine issues.

File Reading: FileReadTool

FileReadTool reads file contents with line-level offset/limit (default 2000 lines, 256KB max). It handles text, images (with resize + downsampling), PDFs (up to 20 pages), and Jupyter notebooks. It also deduplicates reads — if the same file+range is read again and the modification time matches, it returns a stub (“File unchanged since last read”) to save tokens.

LSP Integration

Claude Code supports Language Server Protocol integration, but it comes from plugins only (not user settings). When an LSP server is connected, Claude gets access to operations like:

goToDefinition — Jump to where a symbol is defined
findReferences — Find all usages of a symbol
hover — Get documentation for a symbol at a position
documentSymbol — List all symbols in a file
workspaceSymbol — Search symbols across the project

The LSP tool is deferred — it is not loaded upfront in the system prompt. Claude must discover it via ToolSearchTool first. This saves context window space when LSP is not needed.

Tree-Sitter

Tree-sitter is used in Claude Code, but not for reading source code. It is used exclusively for bash command security analysis — parsing bash commands into ASTs to detect dangerous patterns like command substitution, process substitution, and redirect targets. There is a pure-TypeScript fallback parser for when the native module is unavailable.

You now understand how Claude reads code. Let us look at the build system.

The Build System

Claude Code is built with esbuild (not Bun’s bundler) into a single-file CLI output. The build system uses several clever techniques to keep the bundle small and fast.

Single-File Output

The entry point (src/entrypoints/cli.tsx) compiles to dist/cli.mjs — one file, no code splitting. This is intentional for a CLI tool: single-file output means no module resolution at runtime, faster startup, and simpler deployment.

Dead Code Elimination via Feature Flags

Claude Code uses import { feature } from 'bun:bundle' pervasively — 204+ imports across the codebase. In production builds, Bun’s bundler treats these as compile-time constants and eliminates entire branches of dead code. For example, internal-only features (ant-only analytics, XAA auth, team memory) are completely stripped from the external build.

In the esbuild-based external build, bun:bundle is aliased to a shim that reads from environment variables (all defaulting to false).

Lazy Loading

Several lazy loading strategies keep startup fast:

lazySchema — Memoizes Zod schema construction for 482+ tool schemas, deferring from module init to first access
shouldDefer tools — Tools like LSP, WebSearch, and TodoWrite are not loaded upfront. Claude discovers them via ToolSearchTool on demand
Conditional require() — Feature-gated tools use conditional requires that Bun’s dead-code eliminator strips entirely
Dynamic import() — Cloud SDKs (AWS Bedrock, Azure, GCP), OpenTelemetry exporters, native modules (sharp, node-pty), and UI components are loaded lazily at runtime

The React Compiler

Claude Code uses the React Compiler for automatic memoization of React components. This optimizes re-renders in the terminal UI without manual useMemo/useCallback wrappers. Developers work around its limitations (cannot auto-memoize imported functions, bails out on certain patterns) with plain functions, refs, and explicit memoization where needed.

You now understand the build system. Let us look at how you can hook into Claude Code’s lifecycle.

Hooks: Extending the Agent Loop

Claude Code exposes 26 lifecycle events that let you run shell commands or TypeScript functions at critical points in the agent loop. This is the primary extensibility mechanism beyond tools and plugins.

0/12 hooks fired

Click "Run Session" to watch Claude Code hooks fire in real time

Agent Lifecycle

Per-Turn

Tool Execution

Sub-agents

Context

Permissions

What Are Hooks?

Hooks fire at specific moments during a Claude Code session. You define them in your settings and they can inspect, approve, deny, or modify the action being taken. Think of them as middleware for the agentic loop.

The Key Hook Events

PreToolUse — Fires before a tool runs. Can return approve, deny, or modified (with changed input). This is the most powerful hook — it lets you enforce policies programmatically.
PostToolUse — Fires after a tool completes. Can modify the tool result before Claude sees it.
PostToolUseFailure — Fires when a tool errors. Useful for logging or recovery logic.
UserPromptSubmit — Fires when the user sends a message. Can inject additional context into the prompt.
Stop — Fires when Claude finishes responding. Useful for post-processing or analytics.
SubagentStart / SubagentStop — Fire when sub-agents spawn and complete.
PreCompact / PostCompact — Fire around conversation compaction.
PermissionRequest / PermissionDenied — Fire when the permission system prompts the user.
InstructionsLoaded — Fires when CLAUDE.md files are (re)loaded.
SessionStart / SessionEnd — Fire at session boundaries.

How PreToolUse Hooks Work

PreToolUse is the most interesting hook. It receives the tool name and input, and can return:

permissionDecision: "approve" — Auto-approve the tool without asking the user
permissionDecision: "deny" — Block the tool with a reason
updatedInput — Modify the tool’s parameters before execution
additionalContext — Inject extra information Claude will see alongside the tool result

This means you can write a hook that, for example, automatically approves Bash(npm test) but blocks Bash(rm -rf *) — all without user interaction.

Hook Types

Hooks come in three forms:

Shell commands — Defined in settings.json with matcher patterns. The hook receives event data via stdin and returns decisions via stdout as JSON.
Function hooks — In-memory TypeScript functions registered via addFunctionHook(). Used by plugins and the SDK.
Async hooks — Background processes with timeout management and progress reporting.

You now understand hooks. Let us look at one of Claude Code’s most fascinating optimizations.

Speculation: Predictive Execution

Claude Code can pre-execute the next agentic loop before you type anything. This is like branch prediction for an AI agent — it delivers results instantly when you accept a prompt suggestion.

SPECULATION ENGINE

Claude Code pre-executes the next agentic loop before you type anything, using a copy-on-write overlay filesystem so speculative edits never touch your real files.

How Speculation Works

After Claude finishes responding, it generates a prompt suggestion (the gray text you see below the input). Behind the scenes, Claude Code starts a speculative execution of that suggestion using a forked agent.

The key innovation is a copy-on-write (COW) overlay filesystem. When the speculative agent needs to write a file:

The original file is copied to a temporary overlay directory
All writes are redirected to the overlay
Reads check the overlay first — if the file was written there, the overlay version is returned
The real working directory is never touched

This creates a sandbox where speculation can make real edits without affecting your project. If you accept the suggestion, overlay files are copied to the real directory. If you dismiss it, the overlay is discarded.

The Forked Agent Pattern

Speculation uses the runForkedAgent() utility, which creates an isolated sub-agent that shares the parent’s prompt cache. The Anthropic API caches responses based on a composite key of system prompt, tools, model, and message prefix. The forked agent deliberately uses the same parameters so the API returns cached responses — making speculation cheap instead of doubling your API costs.

Claude Code: Forked Agent Pattern

Background tasks share the parent request prompt cache, cutting per-task cost by ~75%

Prompt Cache Pipeline

System Prompt

4,280 tok

Tools

12 tools

Model

claude-sonnet-4

Msg Prefix

1,024 tok

Cache Key:--------

Waiting

The forked agent has its own abort controller, agent ID, and mutable state (file state cache, permission tracking). It accumulates usage separately and does not pollute the main transcript (skipTranscript flag).

Speculation Boundaries

Speculation is not unlimited. It stops at boundaries:

Non-read-only bash commands (it will not run rm or npm install)
Unknown tools
After 20 turns or 100 messages
When the user manually types something (aborts speculation)

If you accept the suggestion before speculation completes, partial results are injected. If speculation finishes first, the result is cached and ready for instant delivery.

Time Saved

Speculation typically saves 3-5 seconds per accepted suggestion. For common follow-up actions like “run the tests” or “fix that type error”, this makes Claude Code feel dramatically faster.

You now understand speculation. Let us look at how Claude Code renders its terminal UI.

React + Ink: The Terminal UI

Claude Code’s entire CLI is a React application rendered to the terminal. Not a traditional line-by-line CLI — a full React component tree with flexbox layout, state management, and re-rendering. This is made possible by a custom fork of Ink, a React renderer for the terminal.

JSX Source

1<Text color="green">Hello</Text>
2<Text> World</Text>

Terminal Output

terminal

Hello World

Render Pipeline

React Tree

Yoga Layout

Frame Buffer

ANSI Diff

Terminal

The Render Pipeline

The render pipeline goes through several stages:

React components render to a virtual tree of DOMElement and TextNode nodes via a custom reconciler (react-reconciler v0.31)
Yoga layout computes x/y/width/height for each node using flexbox rules (same engine as React Native)
Renderer walks the laid-out tree, painting styled text into a 2D Frame (screen buffer)
LogUpdate diffs the new frame against the previous one and writes only the minimal ANSI cursor-move + text sequences to stdout

This means Claude Code only updates the characters that changed between frames — not the entire screen. For a streaming response, only the new text characters are written.

Memory Efficiency

The CharPool class interns all characters as integer IDs using Int32Array for ASCII characters. This avoids creating millions of string objects during rendering. The HyperlinkPool does the same for OSC 8 hyperlink URIs. Both are critical for performance during long streaming responses.

Custom JSX Elements

Claude Code registers custom JSX intrinsic elements through the reconciler: ink-box, ink-text, ink-progress, ink-scroll-box, and more. These map to terminal primitives like boxes, styled text, progress bars, and scrollable regions.

Theming

A ThemeProvider wraps every render call so all components have access to the current color theme. There is also a live OSC 11 terminal theme watcher for auto mode — if the user changes their terminal color scheme, Claude Code adapts in real time.

You now understand the terminal UI. Let us look at the plugin and skill system.

Plugins and Skills

Claude Code has a full plugin architecture where plugins can contribute skills, hooks, MCP servers, LSP servers, and commands. Skills are reusable named workflows that bundle prompts and tool configurations for specific tasks.

Available capabilities

6/ 6

Plugins

16/ 16

Skills

7/ 7

Hooks

5/ 5

MCP servers

8/ 8

Commands

7/ 7

Tool types

Plugins

Core

Foundational skills for everyday development workflows.

5/5 skills2 hooks2 commands

batchdebugloopverifysimplify

Git

Version control operations and change management.

3/3 skills1 hooks2 commands

commitreviewresolve

Testing

Test generation, execution, and coverage analysis.

2/2 skills1 hooks1 MCP servers1 commands

testverify

Quality

Code quality improvements and performance optimization.

4/4 skills1 hooks2 MCP servers1 commands

refactorsimplifyoptimizemigrate

Learning

Understanding codebases and generating documentation.

3/3 skills1 MCP servers1 commands

explaindocumentsearch

Shipping

Build verification and deployment preparation.

3/3 skills2 hooks1 MCP servers1 commands

deployverifydocument

Bundled skills (16 available)

{ }

Select a skill to inspect its properties

What Is a Skill?

A skill is a pre-packaged workflow that tells Claude how to perform a specific type of task. Claude Code ships with 16 bundled skills: batch (process multiple files), debug (diagnose errors), loop (autonomous experiment loops), verify (check work), simplify (reduce complexity), and more.

Each skill defines:

Prompt template — Instructions for how to approach the task
Allowed tools — Which tools the skill can use
Agent context — Whether it runs inline or in a forked agent
Model override — Can use a different model than the main agent
whenToUse description — Tells Claude when to use the skill

What Is a Plugin?

A plugin is a package that bundles multiple extensions:

Skills — Named workflows
Hooks — Lifecycle event handlers
MCP servers — External tool connections
LSP servers — Language intelligence
Commands — Custom slash commands
Settings — Default configuration values

Plugins are enabled and disabled via the /plugin command. Preferences persist across sessions. Built-in plugins use the {name}@builtin identifier format.

How Skills Are Loaded

Bundled skills are registered at startup with lazy file extraction — reference files are extracted to disk on first invocation using secure file writing (atomic writes with O_NOFOLLOW|O_EXCL, 0o600 permissions, per-process nonce directories). Skills appear in the system prompt so Claude knows when to use them. MCP skills can also be dynamically created from MCP server resources.

You now understand plugins and skills. Let us look at how Claude Code isolates work with git worktrees.

Git Worktree Isolation

Claude Code can create isolated git worktrees for safe parallel work. This lets Claude experiment with changes in a separate working tree without affecting your main branch.

Main Working Treemain

app.ts

utils.ts

index.ts

config.ts

How Worktrees Work

When you ask Claude to use a worktree (or Claude decides it is needed), it:

Creates a new worktree inside .claude/worktrees/ with a new branch based on HEAD
Switches the session’s working directory to the new worktree
Claude works in the worktree — file reads, edits, and bash commands all operate in isolation
When done, you can keep the branch (merge it) or remove it

The main working tree is completely unaffected. This is safer than creating a branch and switching — you never have to switch back, and there is no risk of forgetting which tree you are in.

Hook Integration

WorktreeCreate and WorktreeRemove hook events fire at the appropriate times, allowing custom isolation logic. This enables VCS-agnostic isolation for non-git projects via hooks.

You now understand worktrees. Let us look at how Claude Code gets real-time error feedback from your code.

LSP Diagnostic Feedback

When Claude Code is connected to a Language Server (via a plugin), it receives real-time diagnostics — errors, warnings, and hints from your language’s compiler or linter. These are automatically fed back into the conversation so Claude can see and fix issues.

calculator.tsTypeScript

0 errors

1function add(a: number, b: number): number {
2  return a + b;
3}
4 
5const result: number = add(1, 2);
6console.log(result);

Diagnostic Pipeline

{ }

Claude Edits

FileEdit replaces 1 with "1"

TS

LSP Server

publishDiagnostics notification

DB

Registry

Dedup: LRU cache (500 files)

+

Attachment

Injected into next API call

*

Claude Fixes

Sees error, self-corrects

How Diagnostics Flow

Claude edits a file (e.g., removes a required import)
The LSP server detects a type error and publishes a diagnostic
passiveFeedback.ts converts the LSP diagnostic to Claude’s internal format
The diagnostic is registered in LSPDiagnosticRegistry with volume limiting (max 10 per file, 30 total)
On the next API call, the diagnostic is delivered as an attachment
Claude sees the error and fixes it — without you saying anything

Deduplication

Cross-turn deduplication uses an LRU cache (max 500 files) mapping file URIs to diagnostic signatures (hash of message + severity + range). This prevents Claude from seeing the same error repeatedly across multiple turns.

You now understand LSP feedback. Let us look at the Agent SDK.

The Agent SDK

Claude Code exposes a full programmatic API — the Agent SDK — for embedding it in other applications. This is how IDE integrations, desktop apps, and CI/CD pipelines use Claude Code under the hood.

agent-sdk-demo.ts

1import { AgentSDK } from '@anthropic-ai/agent-sdk'

3const agent = new AgentSDK({

4 model: 'claude-sonnet-4-20250514',

5 apiKey: process.env.ANTHROPIC_API_KEY,

6 hooks: {

7 onInit: () => console.log('SDK ready'),

8 onToolUse: (tool) => approveTool(tool),

9 onStream: (chunk) => process.stdout.write(chunk),

10 },

11 permissions: {

12 allow: ['Read', 'Glob', 'Grep'],

13 deny: ['Write', 'Bash'],

14 },

15})

17const result = await agent.run(

18 'Refactor the auth module to use middleware'

19)

21console.log(result.summary)

22console.log(`Cost: $${result.usage.costUSD}`)

EVENT LOG

Click "Run SDK" to start the event stream

SDK Capabilities

The SDK provides typed interfaces for:

Messages — SDKUserMessage, SDKAssistantMessage, SDKResultMessage for full conversation control
Streaming events — Real-time stream_event payloads as they arrive from the API
System events — init, status, compact_boundary, post_turn_summary, api_retry
Hook events — All 26 lifecycle events available to SDK consumers
Permission decisions — Programmatic approve/deny/reject with classification
Custom agents — Define sub-agents with tools, model, MCP servers, memory scope, and max turns
Usage metrics — total_cost_usd, per-model token breakdowns, permission denial counts

Why This Matters

The SDK means Claude Code is not just a CLI tool — it is an embeddable AI coding agent. VS Code extensions, JetBrains plugins, CI/CD pipelines, and custom applications can all use the same agentic loop, permission system, and tool infrastructure that powers the CLI.

You now understand the Agent SDK. Let us wrap up with some practical tips.

Tips for Power Users

Permission Rules

Define rules in your settings to reduce confirmation prompts:

Bash(git *)       — Allow all git commands
Bash(npm test)    — Allow running tests
FileEdit(/src/*)  — Allow edits in src/
mcp__github       — Allow all GitHub MCP tools

Rules are checked in order: deny first, then allow, then ask, then fallback to the current mode.

Slash Commands

Type / in Claude Code to see available commands:

/help — Show help
/mcp — Manage MCP servers
/compact — Manually trigger conversation compaction
/clear — Clear conversation history
/config — View or change settings

Keyboard Shortcuts

Escape — Cancel the current operation
Ctrl+C — Interrupt a running command
Up/Down arrows — Navigate command history

IDE Integration

Claude Code works in VS Code and other editors via extensions. The IDE extensions use SSE and WebSocket transports to connect to MCP servers running in the editor process, giving Claude access to IDE-specific capabilities like the active file selection and editor state.

You now understand the full machinery inside Claude Code. Let us bring it all together.

What You Learned

You now understand the full machinery inside Claude Code:

Architecture: CLI parser, QueryEngine, agentic loop, tool system, permission model, context manager
The agentic loop: An infinite while(true) that streams from the API, executes tools, and continues until Claude is done
System prompt: Two-part design — static (cached globally) and dynamic (per-session context)
Tool system: ~40 built-in tools with a consistent interface, plus MCP tools from external servers
Tool execution: Partitioned into concurrent read-only batches and serial write batches
Permission model: Defense-in-depth with allow/deny/ask rules, ML-based classification
Context management: Auto-compact, microcompact, tool result budgets to stay within the context window
Sub-agents: Parallel task execution with specialized agent types
Streaming: Raw SSE stream with incremental content block accumulation, FGTS for tool input, streaming tool execution
MCP: External tool servers via stdio/SSE/HTTP/WebSocket with namespaced tools and OAuth auth
Plan mode: 5-phase explore-design-review-write-exit workflow with read-only enforcement
Project memory: CLAUDE.md files, auto-memory, conditional rules, and session scratchpad
Codebase reading: ripgrep-powered glob/grep, FileReadTool with dedup, LSP integration from plugins
Build system: Single-file esbuild output, feature flag dead code elimination, lazy loading, React Compiler
Hooks: 26 lifecycle events (PreToolUse, PostToolUse, Stop, etc.) for programmatic extensibility
Speculation: Copy-on-write overlay filesystem for predictive execution of prompt suggestions
Terminal UI: React + Ink custom fork with Yoga flexbox layout, character-level frame diffing
Plugins and skills: Bundled skills (batch, debug, verify), plugin architecture with hooks + MCP + LSP
Git worktrees: Isolated working trees for safe parallel experimentation
LSP diagnostics: Real-time compiler/linter feedback automatically fed back into conversation
Agent SDK: Embeddable programmatic API for IDE integrations, CI/CD, and custom applications

Self-Check

[ ] What is the agentic loop and why is it infinite?
[ ] How does the system prompt’s two-part design save tokens?
[ ] What determines whether tool calls run in parallel or serial?
[ ] Why does Claude Code check permissions even though Claude itself generated the tool call?
[ ] What triggers auto-compact, and what happens to the conversation?
[ ] How do sub-agents enable parallel work, and what is the parent-child relationship?
[ ] What is deferred tool loading and why is it needed?
[ ] How does the tool result budget prevent context window exhaustion?
[ ] Why does Claude Code use the raw stream instead of BetaMessageStream?
[ ] What is FGTS and how does it reduce perceived latency?
[ ] How does streaming tool execution overlap with an in-progress API response?
[ ] What transport types does MCP support, and which is the default?
[ ] How are MCP tool names formatted in Claude’s tool registry?
[ ] What happens when an MCP server returns HTTP 401?
[ ] What are the 5 phases of plan mode, and what is the only writable file?
[ ] Why do Explore and Plan agents omit CLAUDE.md?
[ ] In what order are CLAUDE.md files loaded, and why?
[ ] What is the @include directive in CLAUDE.md files?
[ ] What is auto-memory and where is it stored?
[ ] How does FileReadTool deduplicate repeated reads of the same file?
[ ] Why is tree-sitter used only for bash parsing and not source code?
[ ] How does the build system use feature flags for dead code elimination?
[ ] What is the lazySchema pattern and why is it used 482+ times?
[ ] How does the React Compiler optimize Claude Code’s terminal UI?
[ ] What can a PreToolUse hook return, and what does each return value do?
[ ] What are the three types of hooks and how do they differ?
[ ] How does the copy-on-write overlay filesystem work in speculation?
[ ] Why does the forked agent share the parent’s prompt cache?
[ ] What are the boundaries that stop speculation?
[ ] How does the Ink render pipeline go from React components to terminal output?
[ ] What is the CharPool and why is it needed?
[ ] What components can a plugin bundle (name at least 4)?
[ ] How do git worktrees isolate work from the main branch?
[ ] How do LSP diagnostics flow from the language server to Claude’s context?
[ ] What does the Agent SDK expose beyond what the CLI provides?

Test Your Knowledge

Question 1 of 710 pts

What is the agentic loop in Claude Code and why is it an infinite while(true)?

Score: 0 / 900%