Why My Agent Pipeline Still Runs on BullMQ

Vercel Workflows went GA on April 16. WorkflowAgent ships with crash recovery, step isolation, durable state that survives process restarts. If your agent crashes between step 3 and step 4, the runtime replays the completed steps deterministically and resumes where it stopped.

My agent pipeline runs on BullMQ. A Redis job queue that's been around since 2019. It doesn't have crash recovery. It doesn't have step isolation. If the worker dies mid-job, the execution context is gone.

I'm not switching.

Not because Vercel built the wrong thing. They built exactly the right thing for a specific set of constraints. But the trade-offs that make BullMQ worse for serverless make it better for a self-hosted pipeline where I need control over execution, costs, and vendor independence.

What Vercel Workflows Actually Built

The architecture is genuinely clever. You add "use workflow" to mark a function as a durable workflow. "use step" wraps individual operations as independently retryable units. Each step runs in isolation, retries without re-executing previous steps, and maintains its own error boundary. If step 3 fails, only step 3 retries. Steps 1 and 2 replay from cached results.

WorkflowAgent, part of AI SDK v7, runs inside this runtime. Model calls serialize across step boundaries, which means your agent can crash, restart, and resume with credentials intact. That's not trivial to build yourself. Tool calls get automatic retries. The whole thing is purpose-built for long-running agentic workloads that outlive a single serverless invocation.

The needsApproval option is the part that genuinely stings. You gate dangerous tool calls behind human approval, and that approval flow survives process suspension. Your agent can sit paused for hours waiting for a human to click "yes," then resume exactly where it stopped. I spent weeks building a custom approval flow for Mission Control that doesn't survive a process restart. Vercel shipped one that survives everything.

Not every model provider can serialize across steps yet, and WorkflowAgent only exposes a stream() method (no generate()). Those are real limitations, and the documentation is still catching up to the runtime. But for most agent use cases, the architecture is sound.

I mean this as praise. Not a setup for a "but."

What I Built Instead

Ouija is a 16-package TypeScript monorepo. BullMQ handles the job queue. Fastify serves the API. Postgres stores state (SQLite for local dev). A React dashboard shows what's running. The plugin system (Kanban, Git, Agent, Notification) hooks into lifecycle events without touching the engine core.

The central design decision is a pure transition function with zero I/O. Every state change flows through a single function that takes the current state and an event, then returns the next state. No database calls inside the function. No HTTP requests. No side effects. The EventBus and JobQueue are separated at the interface level so the transition function never knows it's running on Redis.

920 tests. No mocking Redis. No mocking Fastify. The transition function doesn't talk to either.

Three runners ship with Ouija. LocalAgentRunner handles text-only output. StreamJsonAgentRunner (the default) outputs NDJSON structured events, cold-per-dispatch. SdkAgentRunner is API-key only, for the eventual SaaS path where subscription auth is architecturally unavailable.

The runner: field in ouija.config.yaml determines which one fires. That single field switches the entire billing model.

Mission Control's StreamingExecutor reuses the same NDJSON pattern. So does GhostWriter's publishing pipeline. Architectural consistency across three projects wasn't planned. It turned out that structured event streams are the natural shape of agent output when you're not locked into a managed runtime's format. The pattern emerged from the constraint, not from a design doc.

The whole thing runs on a $7/month Hetzner VPS. v0.4.0 shipped last week with 920 tests green. Phase 1 ("Kill Silent Failures") is complete: 8 tasks shipped, including typed PipelineStatus enums, encoded job IDs, and positive-evidence gating on DispatchOutcome.

Is the architecture boring? Absolutely. That's the point.

The Cost Argument Nobody Makes

Here's where managed runtimes go quiet.

Vercel Workflows charges per step ($2.50 per 100K steps) plus storage costs. WorkflowAgent adds model costs on top. For an agent that runs 20 minutes analyzing a PR, reviewing test coverage, and suggesting changes that succeed at the wrong thing, that's a lot of steps. Run 50 agent sessions per day and you're paying for managed infrastructure that scales linearly with your agent's activity. That compounds.

BullMQ on my VPS is $7/month. Flat. Regardless of how many agents run, how long they take, or whether one of them loops for 40 minutes because it can't resolve its own verification debt.

Self-hosted with a Claude Pro subscription ($20/month), total infrastructure cost is $27/month for unlimited agent sessions. No per-session billing. No compute metering. No surprise invoices when an agent decides a one-line fix requires refactoring the entire module.

The runner: field tells the whole cost story. runner: stream-json with subscription auth means every dispatch uses flat-rate billing. Switch to runner: sdk and you're on metered API billing, per token. One config line separates "fixed cost" from "variable cost that grows with usage."

Most agent framework comparisons (LangGraph, CrewAI, AutoGen, Mastra) don't list "build your own on BullMQ" as a legitimate option. It is. Not because BullMQ is better than any of them at orchestration. Because a job queue you control on hardware you own removes an entire category of cost anxiety.

Token anxiety is real. Infrastructure anxiety should be too.

What I Gave Up

I'm not going to pretend this is all upside.

No crash recovery. If my worker process dies mid-job, the job sits in Redis as "stalled," but the execution context is gone. I restart from scratch. Vercel's deterministic replay is better here. My only answer is "my VPS hasn't crashed in four months." That's not architecture. That's luck with a good hosting provider.

No built-in human approval flows. I'm building something similar in Mission Control, but it's custom, incomplete, and doesn't survive process restarts. Every time I read Vercel's needsApproval docs, I think about the weeks I could have saved.

Redis is another dependency. On a $7 VPS with one operator, fine. In a team environment with high-availability requirements, it's infrastructure to maintain, monitor, and back up.

BullMQ 5.74 broke three ways simultaneously. Queue names with : characters collided with Redis key separators. Job IDs with : hit the same wall. Webhook cardId format was incompatible. I spent a full day building encodeJobId/decodeJobId to sanitize every identifier at 28 call sites across the transition function. The fix shipped. 920 tests passed. But that colon-in-key collision is the kind of bug that's invisible in tutorial examples and explosive with compound identifiers. Nobody warns you in the getting-started guide.

And the pure transition function trades execution durability for testability. Every state change is deterministic and testable. But if the process crashes, that determinism doesn't help you resume. I'd make the same trade-off again. I'm still not sure it's the right trade-off for everyone.

I built my own memory engine from scratch while Anthropic shipped theirs as a managed service. The build-vs-buy tension in agent infrastructure isn't theoretical for me. Building the resilient version yourself means building checkpointing, replay, and state recovery on your own. I haven't done that for execution durability yet. Vercel has.

The One-Line Version

The best infrastructure is the one you understand well enough to debug at 3am.

For me, that's BullMQ. I've read the Redis key format docs. I've fought the colon collision. I know where every job sits in the queue and why.

For most teams starting an agent pipeline today, Vercel Workflows is probably the right call. The DX is better. The failure modes are handled for you. The approval flow alone would have saved me weeks.

Both are better than the agent framework you spent six months evaluating and never shipped with.

Why My Agent Pipeline Still Runs on BullMQ

What Vercel Workflows Actually Built

What I Built Instead

The Cost Argument Nobody Makes

What I Gave Up

The One-Line Version

Get new posts in your inbox

Keep reading

The Flag Is Called --dangerously-skip-permissions. I Run It Every Night.

Anthropic Shipped Agent Memory to Production While I Was Still Debugging Mine

GitHub Is Using GPT to Review Claude's Work. That's Either Brilliant or the Most Expensive Code Review Ever.