
Cursor Just Open-Sourced the Agents That Review AI-Generated Code
Cursor's PR velocity went up 5x. DryRun says 87% of AI PRs have vulnerabilities. The solution? More agents. Four autonomous security agents review every PR, scan for forgotten vulns, and auto-patch dependencies. Templates are public. The meta-layer is here.
Cursor's pull request volume went up 5x in nine months. Their security team did not grow 5x.
SAST, linters, code owners: all of it flags problems after the code lands. When agents are shipping features faster than your scanner can finish running, you're just collecting warnings nobody reads.
So Travis McPeak, Cursor's Head of Security, did what you'd expect an engineer to do. He built agents to protect against the other agents.
Four of them. They review thousands of PRs per week, scan the existing codebase daily, auto-patch dependencies, and watch for compliance drift. Hundreds of vulnerabilities caught in two months.
The PR review prompt is fifteen lines long.
Last week, Cursor open-sourced the templates and Terraform configs.
The problem nobody wants to talk about
DryRun Security released a report on March 11. They gave three AI coding agents (Claude Sonnet 4.6, Codex GPT-5.2, Gemini 2.5 Pro) the same task: build two applications from scratch using a realistic PR workflow.
143 security issues across 38 scans. 87% of pull requests introduced at least one high-severity vulnerability. Claude produced the highest number of unresolved flaws in final codebases. Codex had the fewest vulnerabilities and better remediation behavior.
Nothing exotic. Broken access control. Insecure JWT defaults. OAuth flows wired up wrong. Rate limiting defined in the config but never actually enforced.
I've seen these same bugs on every team I've worked with. The difference: they used to trickle in across a quarter. Now they show up in a Tuesday afternoon.
McPeak was watching the same thing happen inside Cursor. PR volume went up 5x. Static analysis was generating walls of warnings. Security reviews became the slowest step in the pipeline.
"We've always had this struggle in security, where there's more demand for our attention than we can scale ourselves."
He didn't hire more security engineers. He built agents that could actually read what code does, not just pattern-match against known signatures.
The four agents (and how they actually work)
Cursor released templates for four agents on March 16. Each runs on Cursor Automations, their recently launched cloud agent platform.
1. Agentic security review
Purpose: Block PRs with security issues.
How it works:
- Triggered by GitHub webhook (PR opened or updated)
- Reviews diff + surrounding code
- Posts findings to Slack, comments on PR, can block CI pipeline
- Does NOT auto-fix (human approval required)
The prompt:
You are a security reviewer for pull requests.
Goal: Detect and clearly explain real vulnerabilities introduced or exposed by this PR.
Review only added or modified code.
Process:
1. Inspect the PR diff and surrounding code paths.
2. Identify genuine security issues (not style or minor concerns).
3. Explain impact clearly with specific lines and reasoning.
4. Prioritize: injection, authn/authz bypass, secret leakage, SSRF, XSS, CSRF, path traversal, unsafe deserialization, supply chain risk.
Be precise. Engineers need actionable feedback, not warnings.
That's it. Fifteen lines. The engineering is in the orchestration layer underneath.
Results: Thousands of PRs reviewed in two months. Hundreds of issues prevented from shipping. Engineers asked for it on every project (unusual for security tooling).
2. Vuln hunter
Purpose: Scan the existing codebase for vulnerabilities that Review would have caught if they'd existed during PR time.
How it works:
- Runs on a schedule (daily or weekly)
- Divides the repo into logical segments
- Spins up parallel agents to scan each segment
- Findings go to Slack, security team triages, often uses
@Cursorto generate fix PRs
What it catches: Logic bugs buried in years-old code. Broken access control patterns. Forgotten internal services with overly broad permissions.
This one generates the most noise. Scanning an entire repo (not a focused diff) means the model is working with a much larger context. Attention drifts. Confidence stays high.
Snyk's Randall Degges nailed the framing: "The agent can't mark its own homework." An LLM will confidently flag a perfectly safe parameterized query as a SQL injection. It'll also sail right past a real auth bypass three files deep. Both cost you.
That's why Labelbox runs Cursor agents and Snyk together. They cleared a multi-year vulnerability backlog by letting the agents find issues and a deterministic engine verify they're real.
3. Anybump
Purpose: Automated dependency patching.
How it works:
- Monitors dependency updates
- Runs reachability analysis (is the vulnerable code path actually used?)
- Traces through your codebase to understand impact
- Runs tests
- If tests pass, opens a PR automatically
- Final safety gate: Cursor's canary deployment pipeline
McPeak's assessment: "Entirely automated nearly all of it."
Most autonomous of the four. If tests pass and the canary looks clean, code ships without a human touching it.
4. Invariant sentinel
Purpose: Monitor for security and compliance drift over time.
How it works:
- Runs on a cron schedule (daily)
- Maintains a list of security invariants (properties that should always be true)
- Divides the repo, spins up subagents to validate each segment
- Uses Automations' memory feature to compare state across runs
- Detects drift, revalidates, updates memory, reports to Slack
Example invariants: All API endpoints require authentication. Sensitive data is always encrypted at rest. No hard-coded credentials in config files.
This agent catches configuration regressions and forgotten services that quietly violate policies.
The architecture nobody sees
Fifteen-line prompts. Trivial, right?
Not even close. All four agents sit on a custom MCP (Model Context Protocol) server deployed as a serverless Lambda function. That server handles three things:
- Persistent data storage — Agents remember what they've seen before
- Deduplication — A Gemini Flash 2.5 classifier ensures different agents don't file the same issue using different words
- Consistent Slack output — Structured findings with dismiss/snooze actions
Everything is managed through Terraform. GitHub webhooks route to the agents. Cron schedules trigger scans. State is tracked across runs.
The templates are on GitHub: mcpeak/cursor-security-automation
What you get:
security-automations-mcp/— The MCP serverslack-notification-service/— Authenticated Slack delivery- Terraform configs for deployment
What you don't get:
- The exact prompts Cursor uses in production (those evolve constantly)
- Cursor's canary deployment pipeline
- Pre-configured integrations with your specific threat model
McPeak's note in the README:
This repository contains reference implementations only. Everything here is provided as-is, with no warranty, guarantees, support commitment, maintenance obligation, security hardening promise, or fitness-for-purpose representation of any kind.
Translation: This is not a product. It's a starting point.
The Snyk critique (and why it matters)
Snyk responded quickly.
Randall Degges published his breakdown on March 17, a day after Cursor's release. He's not dismissing the work. But he's pushing back on a gap he sees: LLM agents are a research layer, not a validation layer.
His argument boils down to one line: the agent can't mark its own homework. You need something deterministic underneath to confirm what the LLM flagged is actually real. Otherwise you're trusting a probabilistic system to be consistently right about security. And it won't be.
Labelbox figured this out already. They run Cursor agents for discovery and Snyk for validation. Cleared a multi-year vulnerability backlog doing it.
Degges also pointed out something I hadn't considered: Cursor's agents live in CI. They catch issues after the commit. IDE-first tools (Snyk Studio) catch them before the code leaves your machine. You want both layers.
Not a competition. A design pattern. Agents bring semantic reasoning. Deterministic tools bring consistency. Humans bring judgment. Remove any one and the system breaks.
What this means for your pipeline
If you're deploying AI coding agents at scale, you now have three choices:
-
Do nothing. Accept that 87% of AI-generated PRs will contain vulnerabilities. Hope your existing SAST catches them. Watch your backlog grow.
-
Run Cursor's templates. Adapt the agents to your threat model. Deploy the MCP server. Wire up webhooks. Iterate on prompts. Accept that you'll get false positives and need a validation layer underneath.
-
Use a commercial platform. Snyk, TrojAI, GitHub's MCP secret scanning, or other vendors building agent-native security tooling. Pay for support and integration maintenance.
Most teams will land on option 3. That's fine. But read the templates anyway. They show you what enterprise security vendors are building behind closed doors. No magic. Webhooks, prompts, a dedup classifier, and Terraform.
The architecture is portable. Swap Cursor Automations for Lambda or Cloud Functions. Swap GitHub webhooks for GitLab or Bitbucket. Swap Slack for PagerDuty or Jira. The wiring changes. The pattern doesn't.
If agents write your code, agents must review your code. That's the rule now.
Static analysis scales with codebase size. Agent review scales with PR velocity. When velocity goes up 5x, static analysis drowns in its own output. Semantic reasoning becomes the only signal that cuts through.
The meta-layer is here
Three weeks ago, GitHub launched Claude Code Security. Anthropic's agents review pull requests for security issues.
Last week, Cursor open-sourced their agent templates. Travis McPeak's agents have been running in production for months.
This week, TrojAI launched Agent Runtime Intelligence. It captures full agent execution traces, analyzes tool usage, and governs agentic workflows.
Three data points in three weeks. All pointing the same direction: agents protecting against agents.
This isn't a fad. AI-generated code won't slow down. PR velocity won't decrease. The 5x Cursor saw is the new floor, not the ceiling.
"If we don't scale ourselves, things are going to get worse for security as a whole."
McPeak is right. The templates are on GitHub. The playbook is public. Your move.
Related:
Get new posts in your inbox
Architecture, performance, security. No spam.
Keep reading
Your coding agent is a slot machine. You're already pulling the lever.
There's a new name for something engineers have been feeling for a year: token anxiety. The compulsive urge to always be prompting, always shipping. This is what that actually is.
You're Blaming the Model. The Harness Did It.
Everyone's arguing GPT-5 vs Opus while the real bottleneck in LLM coding agents is something nobody talks about: the edit tool format.
Next.js 16.2 Isn't a Framework Update. It's an Agent Platform.
Next.js 16.2 shipped AGENTS.md by default, bundled docs in node_modules, browser logs piped to terminal, and a CLI that gives agents DevTools via shell commands. Vercel isn't improving DX. They're building for a new user: the coding agent.