How much does a Claude Code session cost without prompt caching?

In the analyzed session, the raw token cost without caching would have been $29.82. With prompt caching enabled, the actual cost was $3.11 — a 90% reduction.

What percentage of Claude Code's input tokens are tool definitions?

Tool schemas alone consume 56% of input tokens on every API call. The full system prompt plus tool definitions total approximately 19,000 tokens sent with each request.

How do Claude Code subagents like Plan and Explore differ from the main agent?

Subagents like Plan, Explore, and General-Purpose receive the identical system prompt as the main agent. They are differentiated only by the task message passed to them when spawned.

How much of Claude Code's tool payload is redundant or could be trimmed?

About 19% of the tool definitions (~11,400 characters) are redundant or bloated, primarily from git workflow runbooks embedded in the Bash tool description.

How can you intercept Claude Code API calls for analysis?

By setting ANTHROPIC_BASE_URL to a local transparent proxy (like vistaclair on localhost:3456), all API calls are routed through the proxy which captures full payloads — system prompts, tool definitions, thinking blocks, and SSE streams — while forwarding requests unchanged to Anthropic.

Anatomy of a Claude Code Session: Prompts, Tools, and Token Economics

May 24, 2026 · 20 min read

I intercepted every API call from a single Claude Code session using vistaclair, an agent inspector and transparent proxy I built for Claude Code. 307 files, 61 API requests, 246 hook events, 2 million tokens processed. What I found was a surprisingly intricate system of nested agents, a 15,000-token system prompt, and a prompt caching strategy that turns a $30 session into a$ 3 one. I also found a lot of fat that could be trimmed.

TL;DR: Claude Code sends a ~19,000-token payload (system prompt + tool definitions) on every API call, with tool schemas alone consuming 56% of input tokens. Prompt caching reduces session costs by ~90% (from $29.82 to$ 3.11 in my test). Subagents like Plan and Explore receive the identical system prompt as the main agent, differentiated only by the task message. About 19% of the tool definitions (~11,400 characters) are redundant or bloated, mostly from git workflow runbooks embedded in the Bash tool.

How the data was captured

Vistaclair works by setting ANTHROPIC_BASE_URL=http://localhost:3456 so that Claude Code routes every API call through a local transparent proxy. The proxy captures the full on-the-wire payload — system prompts, tool definitions, thinking blocks, SSE streams, token counts — while forwarding requests unchanged to Anthropic. Each Claude process gets a unique instance ID in the URL, so concurrent sessions stay isolated. The result is a complete, unmodified record of every interaction between the harness and the API.

For this analysis, I pointed Claude Code at a moderately complex task and let it run to completion. Here's the exact prompt I used:

analyze the interactions in /home/hp/vistaclair/interactions/blog-test
in particular how the system prompt varies for main and subagents, skills
and tools. and how that prompt achieves how claude code behaves. also see
if it could be simplified to save tokens, or if there are huge sections
in it wasted for something that's not necessarily needed

The prompt given to Claude Code that generated the session data analyzed in this post.

This prompt exercised subagent spawning (Plan, Explore, General-Purpose), skill invocation, file reads, grep searches, and context compaction — hitting most of Claude Code's capabilities in a single session.

The anatomy of a Claude Code session

A Claude Code session isn't a single conversation. It's an orchestration layer (the "harness") managing multiple Claude API calls, lifecycle hooks, subagent spawns, skill invocations, and context compaction events. My test session asked Claude Code to analyze its own internals, exercising most of its capabilities. Here's what the interaction log looks like from above.

High-level architecture of a Claude Code session. Vistaclair sits as a transparent proxy between the harness and the Anthropic API, capturing every request and response without modifying them.

What's in the system prompt

The main agent's system prompt is 15,141 characters (~3,785 tokens). It's structured as three blocks sent in the request.system array:

Billing header (40 chars): x-anthropic-billing-header: cc_version=2.1.150.4eb; cc_entrypoint=cli; cch=b1d85; . No cache control.
Identity (55 chars): You are Claude Code, Anthropic's official CLI for Claude. Cached for 1 hour.
Main instructions (15,003 chars): Everything else. Also cached for 1 hour.

The main instructions block covers 9 distinct sections. Here's the breakdown.

Section	Approx. chars	Purpose
Security policy	~450	Authorization for CTF/pentesting, URL generation rules
System behavior	~600	Permission modes, prompt injection warnings, hook handling, auto-compression
Doing tasks	~2,200	Software engineering defaults: prefer editing over creating, no overengineering, no unnecessary comments, OWASP awareness
Executing actions with care	~2,800	Reversibility and blast radius rules, with exhaustive examples of risky operations
Using your tools	~400	Parallel tool calls, TaskCreate guidance
Tone and style	~500	No emojis, short responses, code references with file:line format
Text output	~900	User-facing communication rules, end-of-turn summaries
Session-specific guidance	~1,500	Subagent delegation, skill invocation, /schedule offer policy, /ultrareview
Environment	~650	Working directory, OS, model info, Claude model family reference

Breakdown of the main agent system prompt by section

The most interesting section is "Executing actions with care" at 2,800 characters. It's basically a liability firewall, teaching the model to think about reversibility before acting. It includes specific examples like "don't delete branches," "don't force-push," "don't overwrite uncommitted changes." The problem is that the Bash tool description also contains a 1,260-character Git Safety Protocol covering the same concepts. More on that later.

Tool definitions: the real token hog

Here's the thing that surprised me most: the 18 tool definitions consume 61,345 characters (~15,300 tokens). That's 80% of the combined system prompt + tools payload, and 56% of all input tokens on every API call. The system prompt itself is comparatively small.

Bash alone is 20% of all tool definitions. The top 6 tools consume two-thirds of the tool token budget.

Why Bash is 12,367 characters

The Bash tool description is a small manual. It contains a 1,260-character Git Safety Protocol (7 "NEVER" rules), a 4,217-character step-by-step commit workflow, and a 1,941-character PR creation workflow, complete with HEREDOC examples. These are procedural runbooks that only matter when the user asks to commit or create a PR, yet they're sent on every single API call.

# Git Safety Protocol (1,260 chars, sent every call)
- NEVER update the git config
- NEVER run destructive git commands (push --force, reset --hard...)
- NEVER skip hooks (--no-verify, --no-gpg-sign...)
- NEVER run force push to main/master...
- CRITICAL: Always create NEW commits rather than amending...
- When staging files, prefer adding specific files...
- NEVER commit changes unless the user explicitly asks...

# Step-by-step commit workflow (4,217 chars)
1. Run git status + git diff + git log in parallel...
2. Analyze all staged changes and draft a commit message...
3. Run add + commit + verify in parallel...
4. If pre-commit hook fails, fix and create NEW commit...

# Step-by-step PR workflow (1,941 chars)
1. Run git status + git diff + git log + git diff base...HEAD...
2. Analyze all changes and draft PR title/summary...
3. Push + create PR with gh pr create...

58% of the Bash tool description is git-specific content. That's 6,255 characters of commit and PR instructions sent on every API call, even when you're just running `ls`.

Duplication between system prompt and tools

The system prompt's "Executing actions with care" section (2,800 chars) and the Bash tool's Git Safety Protocol (1,260 chars) cover the same six concepts: destructive operations, force push, git reset --hard, --no-verify, amending commits, and file staging. The emoji policy ("Only use emojis if the user explicitly requests it") appears three times: in the system prompt, the Edit tool, and the Write tool.

The four subagent types

Claude Code can spawn four types of subagents via the Agent tool. Each gets a different combination of model, tools, system prompt, and capabilities. Here's the full comparison from my session.

	Main Agent	Plan	Explore	General-Purpose	Guide
Model	Opus 4.6	Opus 4.6	Opus 4.6	Opus 4.6	Sonnet 4.6
System prompt	15,003 chars	Identical	Identical	2,434 chars	17,218 chars
Tool count	18	18	18	9	4
Max tokens	64,000	64,000	64,000	64,000	32,000
Thinking mode	Adaptive	Adaptive	Adaptive
Can spawn sub-subagents	✓	✓	✓
Can edit/write files	✓	✓	✓	✓
Can manage tasks	✓	✓	✓
Can ask user questions	✓	✓	✓

Comparison of Claude Code agent types. Plan and Explore are indistinguishable from the main agent at the API level.

The most surprising finding: Plan and Explore get the exact same system prompt, tools, and model as the main agent. Character for character identical. The only difference is the task description the main agent writes when spawning them. This means there's nothing stopping a Plan agent from writing code, or an Explore agent from deleting files. The behavioral differentiation is entirely in the prompt the parent sends, not in any system-level restriction.

How subagent behavior is controlled

The Agent tool description tells the main agent: "Plan agents are for designing implementation plans" and "Explore agents are for locating code." It also specifies which tools each agent type should have access to (e.g., Explore: "All tools except Agent, ExitPlanMode, Edit, Write, NotebookEdit"). But in practice, the harness doesn't enforce these restrictions for Plan and Explore. It only restricts tools for General-Purpose (9 tools) and Guide (4 tools).

The general-purpose worker

General-purpose subagents are the workhorses. They get a much leaner system prompt (2,434 chars vs 15,003), no thinking mode, and 9 tools instead of 18. Critically, they can't spawn further subagents (no Agent tool), can't manage tasks, and can't ask the user questions. They're designed to receive a task, do it, and report back.

In my session, three general-purpose agents were launched simultaneously for a code review, each with a different "angle": line-by-line scan, structural analysis, and cross-file tracing. This parallelization pattern is the main value of subagents. You get three independent analyses running concurrently, each with their own context window.

The guide agent (Sonnet)

The Guide is the odd one out. It runs on Sonnet (5x cheaper), gets only 4 read-only tools (Bash, Read, WebFetch, WebSearch), and has a completely different domain-specific system prompt focused on documentation retrieval. Its job is to fetch docs from code.claude.com/docs and platform.claude.com/llms.txt, find relevant pages, and return actionable guidance. It's essentially a RAG agent hard-coded to Anthropic's documentation.

How skills inject context

Skills are Claude Code's plugin system. When you type /claude-api or the model invokes a skill, something dramatic happens to the context window. The skill system first dispatches Sonnet sub-calls to fetch documentation (cheap). Then it injects the full, unsummarized documentation directly into the main Opus agent's message history.

A single skill invocation quadrupled the context window from 41K to 182K tokens, triggering compaction.

The /claude-api skill injected 377,033 characters of Anthropic SDK documentation. The /update-config skill injected 132,835 characters of hooks and settings documentation. Neither was summarized before injection. This is the single most expensive pattern in the entire session: one skill call (file 257) cost $0.70, which is 22% of the total$ 3.11 session cost.

Context compaction: how the session survives

After the skill injection pushed the context to 182K tokens, Claude Code performed automatic compaction. This is one of the cleverest parts of the system. Here's what happened:

49 messages were collapsed into 2
182,282 tokens dropped to 36,480 (80% reduction)
All tool call/result pairs were discarded
All thinking blocks were discarded
Skill documentation was truncated from 377K to ~20K chars (94.7% removed)
A structured 6,496-character summary was generated covering: user intent, files, errors, problem-solving history, verbatim user messages, pending tasks, and current work state
Recent file contents were preserved via <system-reminder> tags

The compaction summary ends with a key directive: "Continue the conversation from where it left off without asking the user any further questions. Resume directly, do not acknowledge the summary, do not recap what was happening. Pick up the last task as if the break never happened." This prevents the jarring "As I was saying..." effect that would break the user experience.

The system-reminder side channel

<system-reminder> tags are Claude Code's way of injecting system-level context through user-role messages. The system prompt explicitly tells the model: "Tags contain information from the system. They bear no direct relation to the specific tool results or user messages in which they appear." This establishes them as a trusted side channel.

They carry four types of content:

Skill listings: ~3,800 chars listing all available skills and their trigger conditions, injected in the first user message of every request
Context variables: User email, current date, and other session metadata
Tool replay: After compaction, recent file reads are preserved as system-reminder-wrapped tool call/result pairs
Previously-invoked skills: After compaction, truncated skill documentation is preserved with a warning: "Do NOT re-execute these skills or perform their one-time setup actions again"

Why system-reminder uses the user role

The Anthropic API has a strict alternating user/assistant message structure. You can't just insert system messages mid-conversation. By using <system-reminder> tags in user-role messages, Claude Code can inject context at any point in the conversation without breaking the message structure. The system prompt's instruction to treat these tags as system-level information makes this work.

Token economics: where the money goes

My test session made 60 successful API calls and cost $3.11. Without prompt caching, the same session would have cost$ 29.82. That's an 89.6% savings, and it's the single most important design decision in Claude Code's architecture.

Token type	Volume	Cost	% of total
Cache read (hits)	1,709,589	$0.84	27%
Cache creation	219,881	$1.31	42%
Uncached input	114,114	$0.41	13%
Output	24,157	$0.56	18%

Token breakdown for the entire session. Cache creation is the largest cost component at 42%, billed at 1.25x the input rate.

Cache creation is the top cost at 42%. Every time the conversation grows, new tokens must be cache-created at a 1.25x premium ( $18.75/MTok for Opus vs$ 15/MTok for regular input). The trick is minimizing how often the cache prefix changes, and Claude Code achieves this by keeping the system prompt and tool definitions static across calls.

The main Opus agent dominates cost at 72%. Sonnet is used strategically for cheap auxiliary work.

The prompt bloat analysis

I identified approximately 11,400 characters (~2,850 tokens) of reducible content across the tool definitions. That's 19% of the tool payload. Here's where it lives.

Source of bloat	Chars	Issue
Bash git commit/PR workflows	~6,158	Step-by-step runbooks only needed for commit/PR tasks, sent every call
Bash Git Safety Protocol	~1,260	Overlaps with system prompt's "Executing actions with care" section
Bash sleep/Monitor cross-refs	~883	Both tools explain when to use the other, circular references
Monitor code examples	~1,284	5 shell scripts including niche CI polling with `jq` and `comm`
Agent XML examples	~1,667	Two full teaching examples with `<thinking>` tags and `<commentary>`
EnterPlanMode examples	~2,679	22 examples to convey one concept: "use plan mode for non-trivial tasks"
Emoji duplication	~150	Same instruction appears in system prompt, Edit, and Write

Reducible content in tool definitions. The Bash git workflows are the biggest offender.

Could this actually be simplified?

Yes, but with caveats. The 11,400 characters of bloat translates to roughly 2,850 tokens. At Opus cache-read rates ( $1.50/MTok), that's$ 0.004 per call, or about $0.25 over my 60-call session. Not exactly bankrupting anyone.

The real cost of bloat isn't financial. It's attentional. Every token in the system prompt competes for the model's attention with the actual task. A 6,158-character git commit workflow sitting in the Bash tool description while you're asking Claude to fix a CSS bug is noise. Whether it causes measurable degradation in output quality is hard to say, but the principle of "don't distract the model with irrelevant instructions" is well-established in prompt engineering.

The git commit/PR workflows could be conditionally injected, only when the user asks to commit or create a PR. The EnterPlanMode examples could be cut from 22 to 4. The Monitor code examples could lose the niche CI polling script. The duplicated emoji instructions could be centralized. These changes would save ~2,850 tokens per call without losing any behavioral fidelity.

Why Anthropic might keep the bloat

There's a strong argument for the current approach: reliability. The git commit workflow is extremely detailed because getting commits wrong is high-stakes (losing work, amending the wrong commit, force-pushing). The 22 examples in EnterPlanMode exist because the model was probably failing to enter plan mode at the right times. Prompt engineering is empirical, and these verbose sections likely exist because shorter versions produced measurable regressions in some benchmark. The cost of a wrong git commit is much higher than the cost of 2,850 extra cached tokens.

How the prompt shapes behavior

Looking at the system prompt as a whole, I can identify several distinct prompt engineering techniques that make Claude Code behave the way it does.

1. Constitutional constraints via negation

The prompt is heavy on "NEVER" and "don't" instructions. The Bash tool alone has 7 "NEVER" rules. The system prompt has "IMPORTANT" and "CRITICAL" markers on specific instructions. This is a pattern I see a lot in production systems: instead of describing what the model should do (which is open-ended), you enumerate the specific failure modes you've observed and explicitly block them.

2. The "blast radius" framework

The "Executing actions with care" section introduces a decision framework: "Carefully consider the reversibility and blast radius of actions." This gives the model a mental model for categorizing actions, not just a list of things to avoid. Local and reversible? Go ahead. Affects shared systems? Ask first. This is more robust than a pure blocklist because it generalizes to novel situations.

3. Anti-overengineering directives

The "Doing tasks" section contains some of the most opinionated instructions I've seen in a production prompt. "Don't add features, refactor, or introduce abstractions beyond what the task requires. Three similar lines is better than a premature abstraction." And: "Default to writing no comments. Only add one when the WHY is non-obvious." These are direct responses to a known LLM failure mode: over-helping. Without these constraints, Claude would add docstrings to everything, create helper functions nobody asked for, and wrap three-line scripts in try-catch-finally blocks.

4. Communication scaffolding

The "Text output" section is particularly well-crafted: "Before your first tool call, state in one sentence what you're about to do. While working, give short updates at key moments." And: "Write so the reader can pick up cold: complete sentences, no unexplained jargon or shorthand from earlier in the session." This solves the "silent agent" problem where the model makes 15 tool calls without saying anything, and the user has no idea what's happening.

The caching strategy that makes it all affordable

Claude Code uses Anthropic's prompt caching with cache_control: { type: "ephemeral", ttl: "1h" } markers at strategic positions. The system prompt blocks and the most recent user message get these markers. Since the system prompt and tool definitions are identical across every call, they hit the cache on every subsequent request.

In my session, the cache hit rate was 83.7% of all input tokens. Only 5.6% of tokens were completely uncached. The first call creates the cache (4,951 tokens at 1.25x), and every subsequent call reads from it (16,198+ tokens at 0.1x the regular price). This is why the system can afford to send 15,300 tokens of tool definitions on every call. The incremental cost is negligible after the first request.

// From file 5.json - the first main API call
"usage": {
  "input_tokens": 3,           // Almost nothing uncached
  "output_tokens": 1162,
  "cache_creation_input_tokens": 4951,
  "cache_read_input_tokens": 16198  // System prompt + tools from prior call
}

By the first real API call, almost everything is already cached. Only 3 tokens were uncached.

Hooks: the zero-cost event system

246 of the 307 files in my session were hook events. Hooks are shell commands the harness runs in response to lifecycle events. They cost zero API tokens because they're local processes. The harness fires hooks for everything: before and after each tool call, when subagents start and stop, when tasks are created, when the user submits a prompt, when compaction occurs.

Hook Event	Count	Purpose
PreToolUse	84	Fired before each tool call, can block/modify
PostToolUse	79	Fired after each successful tool call
PostToolBatch	46	After a batch of parallel tool calls
SubagentStop/Start	14	Subagent lifecycle tracking
PostToolUseFailure	5	When a tool call fails
TaskCreated/Completed	8	Task management events
Stop	2	Main agent turn ended
UserPromptSubmit	2	User entered a prompt
Pre/PostCompact	2	Context compaction events
SessionStart	2	CLI session initialized

Hook events from the session. These are local shell commands, not API calls.

Hooks are the extensibility mechanism. Want to lint every file before Claude writes it? Add a PreToolUse hook on the Write tool. Want to log every command Claude runs? PostToolUse on Bash. Want to block certain operations entirely? Return a non-zero exit code from a PreToolUse hook. The system prompt tells the model to treat hook feedback as coming from the user, which means hooks can redirect Claude's behavior mid-task.

Key takeaways

Tool definitions are the dominant cost, consuming 56% of input tokens (15,300 tokens per call) and 80% of the system prompt + tools payload. The system prompt itself is relatively lean at 3,785 tokens.
Prompt caching saves ~90% of costs. My session would have cost $29.82 without caching, but cost$ 3.11 with it. The 19,000-token static payload (system prompt + tools) is cached after the first call and reused at 0.1x the price.
Plan and Explore subagents are architecturally identical to the main agent. Same system prompt, same tools, same model. Behavioral differentiation happens entirely through the task prompt the parent sends. Only General-Purpose and Guide get restricted tool sets.
Skills inject full documentation without summarization, which is the single biggest context-window spike. The /claude-api skill dumped 377,033 characters into the main context, costing $0.70 for that one call. Compaction later truncated 94.7% of it.
Context compaction is the safety valve. When the context hits limits, Claude Code collapses 49 messages into 2, generates a structured summary, preserves recent file state, and tells the model to continue seamlessly. It works remarkably well.
About 19% of tool definitions (~11,400 chars) could be trimmed without losing functionality. The Bash tool's 6,158-character git workflow runbooks are the biggest target. But caching makes this a quality concern (attention competition) rather than a cost concern.
The system-reminder side channel is clever. Using XML tags in user-role messages lets Claude Code inject system context at any point in the conversation without breaking the API's alternating message structure.
The most expensive single call was a skill invocation (file 257, $0.70), not any conversation turn. If you're cost-conscious with Claude Code, be selective about which skills you invoke.