
I Run Six MCP Servers Daily. Here's What Breaks.
MCP won the standard war. But running six servers in production every day exposes failure modes no demo will show you: context bloat, silent auth failures, and tool selection that falls apart at scale.
Last Thursday my publishing pipeline silently stopped posting to LinkedIn. No error. No timeout. The Zapier MCP server just started returning empty results because an auth token expired somewhere in the HTTP bridge and the middleware failed open instead of closed. I didn't notice for two days.
That's what running MCP in production actually looks like. Not the clean demo where you connect a server and watch the agent call tools. The version where six servers compete for context window space, auth tokens rot without warning, and your agent starts picking the wrong tool because it's drowning in 200 schema definitions it'll never use.
I've been running this setup since early 2026. Six MCP servers, sometimes more, powering everything from web research to memory persistence to automated publishing. I built two of them. And I've hit enough walls to know that MCP's real problems aren't the ones getting discussed in spec proposals.
The Setup
My daily stack: GitHub MCP for repo operations. Exa for web search and deep research. Engram (my own memory engine) for recall and context persistence across sessions. Zapier for LinkedIn publishing. Context7 for pulling live documentation. Brave Search as a fallback research tool.
Some run locally via stdio. Others connect through HTTP transport via mcporter. All of them register their tools at startup and dump schemas into the context window before I've typed a single character.
On a good day, this works beautifully. The agent recalls what I was building yesterday via Engram, researches the latest discourse via Exa, pulls accurate API details from Context7, and publishes the result through Zapier. On a bad day, it hallucinates tool names, calls Exa when it meant to call Brave, and silently loses memory because a hook path was wrong.
The bad days taught me more than the good ones.
Problem 1: The Context Tax
This is the one everyone talks about, and the numbers are worse than you think.
Every MCP tool costs roughly 295 to 1,400 tokens just for its schema definition. Name, description, parameter types, enums, field descriptions. All of it gets injected into the context window at startup, whether the model uses that tool or not.
A developer running 8 production MCP servers measured it: 224 tools, 66,000 tokens consumed before the first message. That's a third of Claude's 200K context window. Gone. On tool definitions.
One documented deployment connecting just three servers (GitHub, Playwright, IDE integration) consumed 143,000 of 200,000 tokens. Seventy-two percent. For menus the agent mostly never opened.
I didn't hit those extremes, but I felt the compression. With six servers, I noticed responses getting shallower. The agent would lose track of earlier conversation context. Complex multi-step tasks would degrade on the fourth or fifth step because the reasoning space was already half-occupied by tool schemas for Brave Search endpoints I'd call maybe once a week.
The worst part: tool selection accuracy drops off a cliff. Researchers measured it falling from 43% to under 14% when agents faced bloated tool sets. Seven out of eight times, the agent picked the wrong tool. I didn't need the research to confirm it. I watched Claude try to call mcp__exa__web_search_exa when I clearly asked it to check Engram memory. The schema list was just too long for the model to parse reliably.
Problem 2: Silent Auth Failures
This one almost nobody writes about, and it's the most dangerous.
My HTTP-bridged MCP servers (Exa, Zapier, Context7) authenticate via bearer tokens. When those tokens expire or rotate, the failure mode isn't a loud crash. It's silence. The middleware accepts the request, the server returns empty results, and the agent proceeds as if nothing happened.
Here's what that looks like in practice: I asked GhostWriter to research trending topics via Exa. It came back with three proposals based entirely on its own training data because Exa returned zero results and the agent didn't flag it as an error. It just worked with what it had. The proposals were plausible but stale. I didn't catch it until I noticed the research brief had no source URLs.
The MCP spec added an HTTP auth framework in the November 2025 revision. On paper, the problem is solved. In practice, it's optional, stdio transports don't use it, and most server authors treat auth as somebody else's problem. Microsoft learned this the hard way when CVE-2026-32211 dropped in April: their Azure MCP Server had zero authentication. CVSS 9.1. An attacker with network access could read API keys, project data, authentication tokens. The auth framework existed in the spec. Microsoft just... didn't implement it.
Between January and February 2026, security researchers filed over 30 CVEs targeting MCP servers. Not sophisticated exploits. Basic failures: missing auth, path traversal, command injection, hardcoded credentials. A survey of 2,614 MCP implementations found 82% use file operations vulnerable to path traversal. OWASP published an entire MCP-specific Top 10 framework because the baseline security is that bad.
I'm not immune to this. My Engram MCP server runs with full database access. If someone got network access to my VPS, they could read every memory I've stored. I trust my network setup, but "trust my network setup" is exactly the kind of reasoning that shows up in post-incident reports.
Problem 3: The Startup Dependency Chain
Nobody tells you that MCP server startup order matters. But it does.
My Engram MCP server connects to Supabase on startup. If Supabase is slow or the connection times out, the server starts in a degraded state. The PreCompact hook (which saves key decisions to long-term memory before context compaction) silently stops working. I don't find out until hours later when I realize the session's decisions weren't persisted.
After renaming the Engram package from openclaw-memory to engram, every hook path in ~/.claude/settings.json broke. The hooks still fired. They just pointed at a path that didn't exist anymore. No error in the UI. No warning. Sessions ran for a full day without memory persistence before I noticed the gap.
This is the thing about running multiple MCP servers: they create invisible dependency chains. Engram depends on Supabase. GhostWriter depends on Engram for memory, Exa for research, Zapier for publishing. If any link fails silently (and they almost always fail silently), the output degrades in ways that look correct but aren't.
Problem 4: The Tool Collision
When you run six servers from different authors, naming conventions collide.
I've watched Claude get confused between web_search_exa and brave_web_search because the descriptions overlap. Both say they search the web. Both accept a query parameter. The model picks whichever schema it parses first, which depends on server registration order, which depends on startup timing, which is nondeterministic.
The MCP spec has no namespacing convention beyond the server name prefix. So you get mcp__exa__web_search_exa next to mcp__brave-search__brave_web_search and hope the model figures out the difference from the descriptions alone. Sometimes it does. Sometimes it calls Brave when I needed Exa's neural search mode, and the results come back shallow because Brave doesn't do semantic matching.
What I've Done About It
I won't pretend I've solved any of this cleanly. But a few workarounds have made the daily grind survivable.
The biggest win was deferred tool loading. Claude Code supports a ToolSearch mechanism where tool schemas aren't injected at startup. Instead, the agent gets a list of available tool names and fetches the full schema only when it decides to use one. This is exactly the hierarchical discovery pattern that CLI tools have used for 50 years. It cuts my baseline context cost from tens of thousands of tokens to nearly zero. The tradeoff is an extra round trip per tool discovery. Worth it every time.
I also added explicit server health checks before any pipeline run. Not just "did it return 200." Real validation. For Exa, that means running a test query and confirming results come back with actual URLs. For Engram, it means recalling a known memory and checking the response isn't empty. If anything fails, the pipeline stops and notifies me instead of running with partial data.
And I cut servers. Started with eight. Dropped to six. Brave Search became a fallback instead of a primary. I removed the Google Dev Knowledge server because Context7 covers the same ground with less overhead. Every server you add isn't free. It costs context space, increases collision probability, and adds another silent failure point. I was wrong about more servers meaning more capability. Past a threshold, more servers means more noise.
Where This Goes
MCP won the standard war. Linux Foundation governance. OpenAI, Google, Microsoft, AWS all on board. 19,831 servers on the Glama registry. 97 million monthly downloads. That's not going away.
But the protocol is splitting into two identities. MCP for discovery (what tools exist and what they do) is genuinely useful. Nothing handles tool discovery better. MCP for execution (actually calling tools at scale with auth and cost controls) is where it's losing ground. Cloudflare built Code Mode and got 81% token reduction by generating SDK code instead of making MCP calls. Perplexity's CTO announced they dropped MCP internally in March 2026. Y Combinator's Garry Tan built a CLI alternative.
The 2026 roadmap identifies the right priorities: stateless connection modes, standardized authentication, task lifecycle management, enterprise governance. But the spec hasn't shipped a new version since November 2025. The CVEs are being filed now. The workarounds are being built now.
I'm still running six servers. I'll probably keep running them because the productivity gain when everything works is real. The agent that can recall my last session, research current trends, pull live docs, and publish to LinkedIn in one pass is genuinely better than the alternative of doing each step manually.
But I've stopped pretending the infrastructure is ready. Every morning I check that Engram connected, Exa authenticated, and Zapier's token is still valid. That's not a workflow. That's a pre-flight checklist for a protocol that was supposed to be plug-and-play.
I didn't expect to become a systems operator for my own tooling integrations. But here I am. And if you're running more than two MCP servers in production, you probably are too.
Get new posts in your inbox
Architecture, performance, security. No spam.
Keep reading
Skills, MCP, and the Orchestration Gap Nobody's Fixing
Agent skills became an open standard. MCP connects everything. But the layer between them, the one that keeps agents from failing catastrophically in production, barely exists.
The Quiet Death of the IDE: Why 46% of Developers Fell in Love with a Terminal
The Pragmatic Engineer's 2026 survey says Claude Code is the most loved AI dev tool at 46%. Cursor sits at 19%. Copilot at 9%. I switched six months ago. The terminal won, and it wasn't even close.
I Compared Three AI Memory Systems. They Can't Even Agree on What Memory Means.
SimpleMem compresses conversations into atoms. MemPalace stores every word in a spatial hierarchy. Engram forgets on purpose. After a week with all three, I think they're solving different problems.