DevBolt
By The DevBolt Team··10 min read

MCP Context Window: How to Optimize Your AI Agent Setup

MCPAIPerformanceHowTo

The Model Context Protocol (MCP) lets AI coding assistants — Claude Desktop, Cursor, Windsurf, VS Code Copilot — call external tools like databases, file systems, and APIs. But every MCP server you enable injects its tool definitions into the context window of every single prompt. With a typical setup of 5-10 servers, you can lose 5,000-15,000 tokens before you even ask your first question. This guide shows you how to audit, measure, and optimize your MCP configuration to get the most out of your context window.

Need to build or edit your MCP config? DevBolt's MCP Config Builder supports Claude Desktop, Cursor, VS Code, Windsurf, and Claude Code with 16 server templates. 100% client-side — your config never leaves your browser.

The Hidden Cost of MCP Tool Definitions

When you connect an MCP server, the AI client registers every tool that server exposes. Each tool registration includes its name, description, and full JSON Schema for parameters. This metadata is injected into the system prompt on every message — not just when you use the tool.

What the AI sees in its system prompt (simplified)
Available tools:
- filesystem_read_file: Read contents of a file
  params: { path: string (required), encoding: string (optional) }
- filesystem_write_file: Write content to a file
  params: { path: string, content: string, encoding: string }
- filesystem_list_directory: List directory contents
  params: { path: string, recursive: boolean }
- github_search_repositories: Search GitHub repos
  params: { query: string, sort: string, per_page: number }
- github_get_file_contents: Get file from a GitHub repo
  params: { owner: string, repo: string, path: string, branch: string }
... (50+ more tool definitions)

// Each tool definition = 50-200 tokens
// 50 tools × ~120 tokens = ~6,000 tokens per message

With Claude's Opus model at $15/million input tokens, a 50-tool setup costs roughly $0.09 per 1,000-message conversation just in tool definitions — before any actual work. For Sonnet ($3/M input), it's about $0.018 per conversation. Small per-message, but it adds up over hundreds of daily sessions.

Step 1: Audit Your MCP Server Config

Your MCP config lives in a JSON file. The location depends on your client:

ClientConfig Location
Claude Desktop~/Library/Application Support/Claude/claude_desktop_config.json
Cursor~/.cursor/mcp.json
VS Code.vscode/mcp.json (per project)
Windsurf~/.codeium/windsurf/mcp_config.json
Claude Code~/.claude/settings.json (mcpServers key)

Open your config and list every server. For each one, ask: Did I use this server in the last week? If not, it's burning tokens for nothing.

Example: Overloaded MCP config (claude_desktop_config.json)
{
  "mcpServers": {
    "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/dev/projects"] },
    "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": { "GITHUB_TOKEN": "ghp_..." } },
    "postgres": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://..."] },
    "sqlite": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-sqlite", "path/to/db"] },
    "brave-search": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-brave-search"], "env": { "BRAVE_API_KEY": "..." } },
    "slack": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-slack"], "env": { "SLACK_TOKEN": "..." } },
    "memory": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-memory"] },
    "puppeteer": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-puppeteer"] },
    "google-drive": { "command": "npx", "args": ["-y", "@anthropic/mcp-server-gdrive"] },
    "sentry": { "command": "npx", "args": ["-y", "@sentry/mcp-server-sentry"], "env": { "SENTRY_AUTH": "..." } }
  }
}

Ten servers is common among power users. But if you're working on a frontend React project, do you really need the PostgreSQL, Sentry, and Google Drive servers running? Each adds 300-1,500 tokens of tool definitions to every prompt.

Step 2: Measure the Token Cost

To understand the actual impact, count the tokens your tool definitions consume:

  1. 1.Open your MCP config JSON file
  2. 2.Paste it into DevBolt's LLM Token Counter to see the raw config size
  3. 3.Multiply by 3-5x for the actual system prompt injection — the AI client expands each server definition into full tool schemas with parameter descriptions
  4. 4.Multiply by your average messages per session to get the total per-session token cost
Rule of thumb: each MCP server adds roughly 200-1,500 tokens of tool definitions to every prompt (depending on how many tools it exposes). A filesystem server with 5 tools is on the low end; a GitHub server with 20+ tools is on the high end.

Step 3: Optimize Your Configuration

Here are five practical strategies to reduce MCP context waste, ranked by impact:

1. Remove servers you don't use weekly

The single highest-impact change. Most developers enable servers when they first discover MCP and never clean up. If you haven't used the Slack or Google Drive server in weeks, remove it. You can always add it back in 30 seconds.

Optimized config: Only what you need
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/dev/projects"]
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "ghp_..." }
    }
  }
}

2. Use project-specific configs instead of global

Instead of one massive global config, use per-project configs. VS Code supports this natively with .vscode/mcp.json. For Claude Code, use .claude/settings.local.json in the project root. Your backend project gets the PostgreSQL server; your frontend project gets just the filesystem server.

.vscode/mcp.json — Frontend project (minimal)
{
  "servers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
    }
  }
}
.vscode/mcp.json — Backend project (database tools)
{
  "servers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost:5432/mydb"]
    }
  }
}

3. Write concise tool descriptions for custom servers

If you're building custom MCP servers, keep tool descriptions to one sentence. A 200-word description wastes tokens on every single prompt. The AI model only needs enough context to decide when to use the tool and what parameters to pass.

Good vs. bad tool descriptions
// BAD: 50+ tokens wasted
{
  "name": "query_database",
  "description": "Execute a SQL query against the PostgreSQL database. This tool
    connects to the configured database and runs any valid SQL statement including
    SELECT, INSERT, UPDATE, DELETE. Results are returned as JSON arrays. The tool
    handles connection pooling, query timeouts, and error formatting automatically.
    Use this for any database operation."
}

// GOOD: 15 tokens, equally effective
{
  "name": "query_database",
  "description": "Execute a SQL query and return results as JSON."
}

4. Consolidate overlapping servers

If you have separate MCP servers for PostgreSQL and SQLite, and you only use one database at a time, only enable the one you need. Similarly, if you have both the Brave Search server and a custom web scraping server, pick one per session.

5. Monitor context window usage

Claude Code shows token usage in the status bar. Cursor shows it in the chat panel. Watch for sessions where you hit the context limit unexpectedly — MCP tool bloat is often the cause. If you're hitting limits in conversations that used to work fine, audit your MCP config first.

Why MCP Context Matters Now: The Perplexity CTO Debate

In early 2026, Perplexity's CTO publicly questioned whether MCP's architecture was fundamentally wasteful — arguing that injecting tool schemas into every prompt was an inefficient design that would not scale as AI agents gained more capabilities. The debate highlighted a real tension in the MCP ecosystem: tool richness vs. context efficiency.

Proponents of MCP argue that the token cost is acceptable because tool definitions are cached in KV cache after the first message, reducing the actual per-message cost to near zero for subsequent turns. Critics counter that the first-message cost still matters, and that the definitions consume context window space regardless of caching — space that could be used for more code, more files, and more conversation history.

The practical answer is to be intentional about which servers you enable. MCP is powerful, but like any tool, it has a cost. The optimization strategies above help you get the benefits of MCP without paying an unnecessary context tax.

5 Common MCP Config Mistakes

  1. 1.Enabling every server "just in case" — Each server has a token cost even when unused. Only enable what you'll use this session.
  2. 2.Hardcoding secrets in the config file — Use environment variables (GITHUB_TOKEN) instead of pasting tokens directly. Your MCP config may end up in version control.
  3. 3.Running database servers with production credentials — AI agents can execute arbitrary SQL. Always use read-only credentials or a development database.
  4. 4.Not restarting after config changes — Most clients require a restart to pick up MCP config changes. Claude Desktop, Cursor, and Windsurf all need a fresh start.
  5. 5.Ignoring npx startup time npx -y downloads the server package on first run, which can take 10-30 seconds. Install globally ( npm install -g) for instant startup.

MCP Server Token Cost Quick Reference

Approximate token cost per server, added to every prompt:

ServerTools~TokensImpact
Filesystem5~400Low
Memory3~250Low
PostgreSQL5~500Medium
GitHub20+~1,500High
Puppeteer8~800Medium
Brave Search2~200Low
Running MCP servers in production? Use DigitalOcean for dedicated MCP server hosting with predictable pricing. Deploy Node.js MCP servers on App Platform or Droplets starting at $4/mo.

Frequently Asked Questions

Does MCP really affect context window size?

Yes. MCP tool definitions are injected into the system prompt, which counts against your context window. With Claude's 200K context window, 10,000 tokens of tool definitions is 5% of your available space — space that could hold roughly 30 pages of code instead.

Are tool definitions cached between messages?

Anthropic's API supports prompt caching, and tool definitions at the start of the system prompt are good candidates for caching. This reduces the token cost (you pay less) but not the context space (they still occupy the window). In practice, reducing the number of tools still frees up context for more code and conversation.

Can I dynamically enable/disable MCP servers mid-conversation?

Not yet in most clients. Claude Desktop and Cursor require a restart to pick up config changes. Claude Code can reload MCP servers within a session using /mcp. This limitation is why project-specific configs are more practical than dynamic toggling.

What's the optimal number of MCP servers?

There's no fixed number, but 2-4 servers per project is a good target. One for file access, one for version control, and one or two for project-specific needs (database, deployment). Beyond 5 servers, the context cost becomes noticeable and the AI may struggle to choose the right tool.

How do I build my MCP config file?

DevBolt's MCP Config Builder lets you visually compose your config with 16 server templates across 5 client formats. Pick your client, add only the servers you need, configure environment variables, and download the config file. All processing happens in your browser.

DB

Written by the DevBolt Team

DevBolt is a collection of 105+ free developer tools that run entirely in your browser — no data ever leaves your device. Built and maintained by AI agents, reviewed by humans. Learn more about DevBolt