Blog

Anatomy of a Claude Code Session: Prompts, Tools, and Token Economics

I intercepted every API call from a single Claude Code session using vistaclair, an agent inspector and transparent proxy I built for Claude Code. 307 files, 61 API requests, 246 hook events, 2 million tokens processed. What I found was a surprisingly intricate system of nested agents, a 15,000-token system prompt, and a prompt caching strategy that turns a 30sessionintoa30 session into a 3 one. I also found a lot of fat that could be trimmed.

TL;DR: Claude Code sends a ~19,000-token payload (system prompt + tool definitions) on every API call, with tool schemas alone consuming 56% of input tokens. Prompt caching reduces session costs by ~90% (from 29.82to29.82 to 3.11 in my test). Subagents like Plan and Explore receive the identical system prompt as the main agent, differentiated only by the task message. About 19% of the tool definitions (~11,400 characters) are redundant or bloated, mostly from git workflow runbooks embedded in the Bash tool.

How the data was captured

Vistaclair works by setting ANTHROPIC_BASE_URL=http://localhost:3456 so that Claude Code routes every API call through a local transparent proxy. The proxy captures the full on-the-wire payload — system prompts, tool definitions, thinking blocks, SSE streams, token counts — while forwarding requests unchanged to Anthropic. Each Claude process gets a unique instance ID in the URL, so concurrent sessions stay isolated. The result is a complete, unmodified record of every interaction between the harness and the API.

For this analysis, I pointed Claude Code at a moderately complex task and let it run to completion. Here's the exact prompt I used:

analyze the interactions in /home/hp/vistaclair/interactions/blog-test
in particular how the system prompt varies for main and subagents, skills
and tools. and how that prompt achieves how claude code behaves. also see
if it could be simplified to save tokens, or if there are huge sections
in it wasted for something that's not necessarily needed
The prompt given to Claude Code that generated the session data analyzed in this post.

This prompt exercised subagent spawning (Plan, Explore, General-Purpose), skill invocation, file reads, grep searches, and context compaction — hitting most of Claude Code's capabilities in a single session.

The anatomy of a Claude Code session

A Claude Code session isn't a single conversation. It's an orchestration layer (the "harness") managing multiple Claude API calls, lifecycle hooks, subagent spawns, skill invocations, and context compaction events. My test session asked Claude Code to analyze its own internals, exercising most of its capabilities. Here's what the interaction log looks like from above.

Claude Code Session Architecture User Harness (CLI / IDE / Web) Orchestration, hooks, permissions, context management ← Vistaclair transparent proxy (captures all traffic) → Main Agent (Opus) System prompt + 18 tools + thinking Plan Agent 18 tools, same prompt Explore Agent 18 tools, same prompt General-Purpose 9 tools, short prompt Guide (Sonnet) 4 tools, domain prompt Lifecycle Hooks PreToolUse, PostToolUse SubagentStart/Stop, etc. 18 Tool Definitions ~15,300 tokens (56% of input) Bash............12,367 chars Agent.............8,948 chars Monitor...........6,290 chars AskUserQuestion...5,020 chars 14 others........28,720 chars Skills System Full docs injected into context claude-api: 377,033 chars update-config: 132,835 chars No summarization on inject 94.7% truncated at compaction Context Management Compaction at context limit 49 messages → 2 messages 182K → 36K tokens (80% cut) Preserves: files, summary Discards: tool calls, thinking
High-level architecture of a Claude Code session. Vistaclair sits as a transparent proxy between the harness and the Anthropic API, capturing every request and response without modifying them.

What's in the system prompt

The main agent's system prompt is 15,141 characters (~3,785 tokens). It's structured as three blocks sent in the request.system array:

  1. Billing header (40 chars): x-anthropic-billing-header: cc_version=2.1.150.4eb; cc_entrypoint=cli; cch=b1d85; . No cache control.
  2. Identity (55 chars): You are Claude Code, Anthropic's official CLI for Claude. Cached for 1 hour.
  3. Main instructions (15,003 chars): Everything else. Also cached for 1 hour.

The main instructions block covers 9 distinct sections. Here's the breakdown.

SectionApprox. charsPurpose
Security policy~450Authorization for CTF/pentesting, URL generation rules
System behavior~600Permission modes, prompt injection warnings, hook handling, auto-compression
Doing tasks~2,200Software engineering defaults: prefer editing over creating, no overengineering, no unnecessary comments, OWASP awareness
Executing actions with care~2,800Reversibility and blast radius rules, with exhaustive examples of risky operations
Using your tools~400Parallel tool calls, TaskCreate guidance
Tone and style~500No emojis, short responses, code references with file:line format
Text output~900User-facing communication rules, end-of-turn summaries
Session-specific guidance~1,500Subagent delegation, skill invocation, /schedule offer policy, /ultrareview
Environment~650Working directory, OS, model info, Claude model family reference
Breakdown of the main agent system prompt by section

The most interesting section is "Executing actions with care" at 2,800 characters. It's basically a liability firewall, teaching the model to think about reversibility before acting. It includes specific examples like "don't delete branches," "don't force-push," "don't overwrite uncommitted changes." The problem is that the Bash tool description also contains a 1,260-character Git Safety Protocol covering the same concepts. More on that later.

Tool definitions: the real token hog

Here's the thing that surprised me most: the 18 tool definitions consume 61,345 characters (~15,300 tokens). That's 80% of the combined system prompt + tools payload, and 56% of all input tokens on every API call. The system prompt itself is comparatively small.

Input Token Composition (Every Main Agent Call) Tool definitions: 56% Messages: 30% Sys: 14% Bash 12,367 chars (20.2%) Agent 8,948 chars (14.6%) Monitor 6,290 chars (10.3%) AskUser 5,020 chars (8.2%) PlanMode 4,343 chars (7.1%) TaskUpd 3,598 chars 12 others 20,779 chars combined Top 6 tools account for 67% of all tool definition tokens
Bash alone is 20% of all tool definitions. The top 6 tools consume two-thirds of the tool token budget.

Why Bash is 12,367 characters

The Bash tool description is a small manual. It contains a 1,260-character Git Safety Protocol (7 "NEVER" rules), a 4,217-character step-by-step commit workflow, and a 1,941-character PR creation workflow, complete with HEREDOC examples. These are procedural runbooks that only matter when the user asks to commit or create a PR, yet they're sent on every single API call.

# Git Safety Protocol (1,260 chars, sent every call)
- NEVER update the git config
- NEVER run destructive git commands (push --force, reset --hard...)
- NEVER skip hooks (--no-verify, --no-gpg-sign...)
- NEVER run force push to main/master...
- CRITICAL: Always create NEW commits rather than amending...
- When staging files, prefer adding specific files...
- NEVER commit changes unless the user explicitly asks...

# Step-by-step commit workflow (4,217 chars)
1. Run git status + git diff + git log in parallel...
2. Analyze all staged changes and draft a commit message...
3. Run add + commit + verify in parallel...
4. If pre-commit hook fails, fix and create NEW commit...

# Step-by-step PR workflow (1,941 chars)
1. Run git status + git diff + git log + git diff base...HEAD...
2. Analyze all changes and draft PR title/summary...
3. Push + create PR with gh pr create...
58% of the Bash tool description is git-specific content. That's 6,255 characters of commit and PR instructions sent on every API call, even when you're just running `ls`.
Duplication between system prompt and tools
The system prompt's "Executing actions with care" section (2,800 chars) and the Bash tool's Git Safety Protocol (1,260 chars) cover the same six concepts: destructive operations, force push, git reset --hard, --no-verify, amending commits, and file staging. The emoji policy ("Only use emojis if the user explicitly requests it") appears three times: in the system prompt, the Edit tool, and the Write tool.

The four subagent types

Claude Code can spawn four types of subagents via the Agent tool. Each gets a different combination of model, tools, system prompt, and capabilities. Here's the full comparison from my session.

Main AgentPlanExploreGeneral-PurposeGuide
ModelOpus 4.6Opus 4.6Opus 4.6Opus 4.6Sonnet 4.6
System prompt15,003 charsIdenticalIdentical2,434 chars17,218 chars
Tool count18181894
Max tokens64,00064,00064,00064,00032,000
Thinking modeAdaptiveAdaptiveAdaptive
Can spawn sub-subagents
Can edit/write files
Can manage tasks
Can ask user questions
Comparison of Claude Code agent types. Plan and Explore are indistinguishable from the main agent at the API level.

The most surprising finding: Plan and Explore get the exact same system prompt, tools, and model as the main agent. Character for character identical. The only difference is the task description the main agent writes when spawning them. This means there's nothing stopping a Plan agent from writing code, or an Explore agent from deleting files. The behavioral differentiation is entirely in the prompt the parent sends, not in any system-level restriction.

How subagent behavior is controlled
The Agent tool description tells the main agent: "Plan agents are for designing implementation plans" and "Explore agents are for locating code." It also specifies which tools each agent type should have access to (e.g., Explore: "All tools except Agent, ExitPlanMode, Edit, Write, NotebookEdit"). But in practice, the harness doesn't enforce these restrictions for Plan and Explore. It only restricts tools for General-Purpose (9 tools) and Guide (4 tools).

The general-purpose worker

General-purpose subagents are the workhorses. They get a much leaner system prompt (2,434 chars vs 15,003), no thinking mode, and 9 tools instead of 18. Critically, they can't spawn further subagents (no Agent tool), can't manage tasks, and can't ask the user questions. They're designed to receive a task, do it, and report back.

In my session, three general-purpose agents were launched simultaneously for a code review, each with a different "angle": line-by-line scan, structural analysis, and cross-file tracing. This parallelization pattern is the main value of subagents. You get three independent analyses running concurrently, each with their own context window.

The guide agent (Sonnet)

The Guide is the odd one out. It runs on Sonnet (5x cheaper), gets only 4 read-only tools (Bash, Read, WebFetch, WebSearch), and has a completely different domain-specific system prompt focused on documentation retrieval. Its job is to fetch docs from code.claude.com/docs and platform.claude.com/llms.txt, find relevant pages, and return actionable guidance. It's essentially a RAG agent hard-coded to Anthropic's documentation.

How skills inject context

Skills are Claude Code's plugin system. When you type /claude-api or the model invokes a skill, something dramatic happens to the context window. The skill system first dispatches Sonnet sub-calls to fetch documentation (cheap). Then it injects the full, unsummarized documentation directly into the main Opus agent's message history.

Skill Context Injection Flow Main Agent (Opus) calls Skill("claude-api") Sonnet fetches docs 100K chars per page, cheap Full docs injected into Opus claude-api: 377,033 chars No summarization. Raw dump. Cost impact: file 257 alone cost $0.70 (22% of total session cost) 104,925 cache creation tokens at Opus rates ($18.75/MTok) Context Window Size Over Time 21K → 41K tokens (30 calls) 182K! compact 36K → 40K (stable) Compaction: 49 messages → 2, with a 6,496-char structured summary
A single skill invocation quadrupled the context window from 41K to 182K tokens, triggering compaction.

The /claude-api skill injected 377,033 characters of Anthropic SDK documentation. The /update-config skill injected 132,835 characters of hooks and settings documentation. Neither was summarized before injection. This is the single most expensive pattern in the entire session: one skill call (file 257) cost 0.70,whichis220.70, which is 22% of the total 3.11 session cost.

Context compaction: how the session survives

After the skill injection pushed the context to 182K tokens, Claude Code performed automatic compaction. This is one of the cleverest parts of the system. Here's what happened:

  • 49 messages were collapsed into 2
  • 182,282 tokens dropped to 36,480 (80% reduction)
  • All tool call/result pairs were discarded
  • All thinking blocks were discarded
  • Skill documentation was truncated from 377K to ~20K chars (94.7% removed)
  • A structured 6,496-character summary was generated covering: user intent, files, errors, problem-solving history, verbatim user messages, pending tasks, and current work state
  • Recent file contents were preserved via <system-reminder> tags

The compaction summary ends with a key directive: "Continue the conversation from where it left off without asking the user any further questions. Resume directly, do not acknowledge the summary, do not recap what was happening. Pick up the last task as if the break never happened." This prevents the jarring "As I was saying..." effect that would break the user experience.

The system-reminder side channel

<system-reminder> tags are Claude Code's way of injecting system-level context through user-role messages. The system prompt explicitly tells the model: "Tags contain information from the system. They bear no direct relation to the specific tool results or user messages in which they appear." This establishes them as a trusted side channel.

They carry four types of content:

  1. Skill listings: ~3,800 chars listing all available skills and their trigger conditions, injected in the first user message of every request
  2. Context variables: User email, current date, and other session metadata
  3. Tool replay: After compaction, recent file reads are preserved as system-reminder-wrapped tool call/result pairs
  4. Previously-invoked skills: After compaction, truncated skill documentation is preserved with a warning: "Do NOT re-execute these skills or perform their one-time setup actions again"
Why system-reminder uses the user role
The Anthropic API has a strict alternating user/assistant message structure. You can't just insert system messages mid-conversation. By using <system-reminder> tags in user-role messages, Claude Code can inject context at any point in the conversation without breaking the message structure. The system prompt's instruction to treat these tags as system-level information makes this work.

Token economics: where the money goes

My test session made 60 successful API calls and cost 3.11.Withoutpromptcaching,thesamesessionwouldhavecost3.11. Without prompt caching, the same session would have cost 29.82. That's an 89.6% savings, and it's the single most important design decision in Claude Code's architecture.

Token typeVolumeCost% of total
Cache read (hits)1,709,589$0.8427%
Cache creation219,881$1.3142%
Uncached input114,114$0.4113%
Output24,157$0.5618%
Token breakdown for the entire session. Cache creation is the largest cost component at 42%, billed at 1.25x the input rate.

Cache creation is the top cost at 42%. Every time the conversation grows, new tokens must be cache-created at a 1.25x premium (18.75/MTokforOpusvs18.75/MTok for Opus vs 15/MTok for regular input). The trick is minimizing how often the cache prefix changes, and Claude Code achieves this by keeping the system prompt and tool definitions static across calls.

Cost Breakdown by Agent Type Main agent (Opus): $2.23 (71.8%) GP subagents (Opus): $0.34 Plan (Opus): $0.11 Skills + Guide (Sonnet): $0.24 Explore + Web search + Title gen: $0.19 Without caching: $29.82 vs $3.11 with caching (89.6% saved)
The main Opus agent dominates cost at 72%. Sonnet is used strategically for cheap auxiliary work.

The prompt bloat analysis

I identified approximately 11,400 characters (~2,850 tokens) of reducible content across the tool definitions. That's 19% of the tool payload. Here's where it lives.

Source of bloatCharsIssue
Bash git commit/PR workflows~6,158Step-by-step runbooks only needed for commit/PR tasks, sent every call
Bash Git Safety Protocol~1,260Overlaps with system prompt's "Executing actions with care" section
Bash sleep/Monitor cross-refs~883Both tools explain when to use the other, circular references
Monitor code examples~1,2845 shell scripts including niche CI polling with jq and comm
Agent XML examples~1,667Two full teaching examples with <thinking> tags and <commentary>
EnterPlanMode examples~2,67922 examples to convey one concept: "use plan mode for non-trivial tasks"
Emoji duplication~150Same instruction appears in system prompt, Edit, and Write
Reducible content in tool definitions. The Bash git workflows are the biggest offender.

Could this actually be simplified?

Yes, but with caveats. The 11,400 characters of bloat translates to roughly 2,850 tokens. At Opus cache-read rates (1.50/MTok), that&#39;s 0.004 per call, or about $0.25 over my 60-call session. Not exactly bankrupting anyone.

The real cost of bloat isn't financial. It's attentional. Every token in the system prompt competes for the model's attention with the actual task. A 6,158-character git commit workflow sitting in the Bash tool description while you're asking Claude to fix a CSS bug is noise. Whether it causes measurable degradation in output quality is hard to say, but the principle of "don't distract the model with irrelevant instructions" is well-established in prompt engineering.

The git commit/PR workflows could be conditionally injected, only when the user asks to commit or create a PR. The EnterPlanMode examples could be cut from 22 to 4. The Monitor code examples could lose the niche CI polling script. The duplicated emoji instructions could be centralized. These changes would save ~2,850 tokens per call without losing any behavioral fidelity.

Why Anthropic might keep the bloat
There's a strong argument for the current approach: reliability. The git commit workflow is extremely detailed because getting commits wrong is high-stakes (losing work, amending the wrong commit, force-pushing). The 22 examples in EnterPlanMode exist because the model was probably failing to enter plan mode at the right times. Prompt engineering is empirical, and these verbose sections likely exist because shorter versions produced measurable regressions in some benchmark. The cost of a wrong git commit is much higher than the cost of 2,850 extra cached tokens.

How the prompt shapes behavior

Looking at the system prompt as a whole, I can identify several distinct prompt engineering techniques that make Claude Code behave the way it does.

1. Constitutional constraints via negation

The prompt is heavy on "NEVER" and "don't" instructions. The Bash tool alone has 7 "NEVER" rules. The system prompt has "IMPORTANT" and "CRITICAL" markers on specific instructions. This is a pattern I see a lot in production systems: instead of describing what the model should do (which is open-ended), you enumerate the specific failure modes you've observed and explicitly block them.

2. The "blast radius" framework

The "Executing actions with care" section introduces a decision framework: "Carefully consider the reversibility and blast radius of actions." This gives the model a mental model for categorizing actions, not just a list of things to avoid. Local and reversible? Go ahead. Affects shared systems? Ask first. This is more robust than a pure blocklist because it generalizes to novel situations.

3. Anti-overengineering directives

The "Doing tasks" section contains some of the most opinionated instructions I've seen in a production prompt. "Don't add features, refactor, or introduce abstractions beyond what the task requires. Three similar lines is better than a premature abstraction." And: "Default to writing no comments. Only add one when the WHY is non-obvious." These are direct responses to a known LLM failure mode: over-helping. Without these constraints, Claude would add docstrings to everything, create helper functions nobody asked for, and wrap three-line scripts in try-catch-finally blocks.

4. Communication scaffolding

The "Text output" section is particularly well-crafted: "Before your first tool call, state in one sentence what you're about to do. While working, give short updates at key moments." And: "Write so the reader can pick up cold: complete sentences, no unexplained jargon or shorthand from earlier in the session." This solves the "silent agent" problem where the model makes 15 tool calls without saying anything, and the user has no idea what's happening.

The caching strategy that makes it all affordable

Claude Code uses Anthropic's prompt caching with cache_control: { type: "ephemeral", ttl: "1h" } markers at strategic positions. The system prompt blocks and the most recent user message get these markers. Since the system prompt and tool definitions are identical across every call, they hit the cache on every subsequent request.

In my session, the cache hit rate was 83.7% of all input tokens. Only 5.6% of tokens were completely uncached. The first call creates the cache (4,951 tokens at 1.25x), and every subsequent call reads from it (16,198+ tokens at 0.1x the regular price). This is why the system can afford to send 15,300 tokens of tool definitions on every call. The incremental cost is negligible after the first request.

// From file 5.json - the first main API call
"usage": {
  "input_tokens": 3,           // Almost nothing uncached
  "output_tokens": 1162,
  "cache_creation_input_tokens": 4951,
  "cache_read_input_tokens": 16198  // System prompt + tools from prior call
}
By the first real API call, almost everything is already cached. Only 3 tokens were uncached.

Hooks: the zero-cost event system

246 of the 307 files in my session were hook events. Hooks are shell commands the harness runs in response to lifecycle events. They cost zero API tokens because they're local processes. The harness fires hooks for everything: before and after each tool call, when subagents start and stop, when tasks are created, when the user submits a prompt, when compaction occurs.

Hook EventCountPurpose
PreToolUse84Fired before each tool call, can block/modify
PostToolUse79Fired after each successful tool call
PostToolBatch46After a batch of parallel tool calls
SubagentStop/Start14Subagent lifecycle tracking
PostToolUseFailure5When a tool call fails
TaskCreated/Completed8Task management events
Stop2Main agent turn ended
UserPromptSubmit2User entered a prompt
Pre/PostCompact2Context compaction events
SessionStart2CLI session initialized
Hook events from the session. These are local shell commands, not API calls.

Hooks are the extensibility mechanism. Want to lint every file before Claude writes it? Add a PreToolUse hook on the Write tool. Want to log every command Claude runs? PostToolUse on Bash. Want to block certain operations entirely? Return a non-zero exit code from a PreToolUse hook. The system prompt tells the model to treat hook feedback as coming from the user, which means hooks can redirect Claude's behavior mid-task.

Key takeaways

  • Tool definitions are the dominant cost, consuming 56% of input tokens (15,300 tokens per call) and 80% of the system prompt + tools payload. The system prompt itself is relatively lean at 3,785 tokens.
  • Prompt caching saves ~90% of costs. My session would have cost 29.82withoutcaching,butcost29.82 without caching, but cost 3.11 with it. The 19,000-token static payload (system prompt + tools) is cached after the first call and reused at 0.1x the price.
  • Plan and Explore subagents are architecturally identical to the main agent. Same system prompt, same tools, same model. Behavioral differentiation happens entirely through the task prompt the parent sends. Only General-Purpose and Guide get restricted tool sets.
  • Skills inject full documentation without summarization, which is the single biggest context-window spike. The /claude-api skill dumped 377,033 characters into the main context, costing $0.70 for that one call. Compaction later truncated 94.7% of it.
  • Context compaction is the safety valve. When the context hits limits, Claude Code collapses 49 messages into 2, generates a structured summary, preserves recent file state, and tells the model to continue seamlessly. It works remarkably well.
  • About 19% of tool definitions (~11,400 chars) could be trimmed without losing functionality. The Bash tool's 6,158-character git workflow runbooks are the biggest target. But caching makes this a quality concern (attention competition) rather than a cost concern.
  • The system-reminder side channel is clever. Using XML tags in user-role messages lets Claude Code inject system context at any point in the conversation without breaking the API's alternating message structure.
  • The most expensive single call was a skill invocation (file 257, $0.70), not any conversation turn. If you're cost-conscious with Claude Code, be selective about which skills you invoke.

Comments

No comments yet.

Leave a comment

We use analytics cookies. Privacy