Back to Blog
The Flag Is Called --dangerously-skip-permissions. I Run It Every Night.

The Flag Is Called --dangerously-skip-permissions. I Run It Every Night.

Overnight autonomous agents are real, productive, and running on a flag with 'dangerously' in the name. Nobody has solved the trust problem. We've agreed to skip it.

ai-agentsclaude-codeautonomous-agentsdeveloper-experience
April 29, 2026
7 min read

7am. Phone on the nightstand. I opened Mission Control and saw 16 new commits across three repos. An Android app that didn't exist when I went to sleep was built, compiled, and ready to install. 63MB APK. Five Gradle modules. Kotlin, Compose, Material 3. The agent had left a Wake Up Briefing: what shipped, what's deployed, what still needs work. I read it over coffee before touching any code.

Everything worked. And everything runs on a flag that has "dangerously" in the name.

What I Built While I Slept

Mission Control is an Android app that lets me approve or deny agent actions from my phone. I built it because I run three autonomous agents from a $7 VPS: GhostWriter (publishes this blog), Rex (analysis and review), and MC-agent (the monitoring backbone). PM2 manages all three. When I'm away from a terminal, I need a way to say "no" to an agent without SSH-ing into a server.

The architecture: Claude Code writes session logs as JSONL. A chokidar-based tailer watches 593 of these files, maps native events to a canonical 16-kind schema, stores them in SQLite, and streams them over SSE to the phone. Approval cards show up when an agent needs permission. I tap allow or deny. The agent continues or stops. A spawner warm pool keeps a Claude instance ready so the turnaround after I tap "allow" is 2.4 seconds, not a cold start. That matters when you're approving actions from bed.

Sprint 1 went from a plan document to a working phone app in roughly three days. One agent (Claude Opus 4.7) did the building. Another agent (Rex) reviewed the plan before a single line shipped and flagged 8 concerns. I arbitrated between them. That's multi-agent governance in practice, not in theory.

The overnight build pattern is simple. Leave Claude running. Go to sleep. Wake up to a briefing. The briefing is the handoff protocol: everything the agent touched, everything it deployed, everything it deferred. I review cold in the morning, which turns out to be better than reviewing tired at midnight.

--dangerously-skip-permissions

The flag does what the name says. The agent can read any file, write any file, execute any command, push to any repo. Without asking. Claude Code's permission model exists to prevent exactly this. I bypass it every night.

The alternative is approving 600+ tool calls manually. Or not running overnight at all.

Neither option is real.

Here's the part I can't rationalize away. I built a mobile approval flow for agent actions. Tap to allow, tap to deny. An "Always" button that saves permission rules for auto-allow. That flow covers tool-level actions inside Mission Control. But the underlying Claude Code runtime skips the permission model entirely. The real fix (Sprint 1.4-b: long-poll approval with a mobile nonce, then drop the flag) hasn't shipped yet. I'm building the lock while the door runs without one.

Every overnight agent tool in the ecosystem has this same gap. Nightshift, Orbit's open-source Python orchestrator for overnight agents, ships worktree isolation and verification gates but assumes runtime permissions are solved. Nightcrawler uses structured HANDOFF.md files and launchd for crash recovery. Same assumption. Ralph, Geoffrey Huntley's PRD-driven autonomous loop, explicitly warns "Autonomous != Unsupervised. Don't be an idiot and auto-merge to main." Good advice. But the permission model that would enforce it is left to the user.

Nobody's solved it. We've agreed to skip it.

The Morning Review

The morning review is where overnight builds earn trust or lose it. Reviewing 16 commits you didn't write is a different skill than reviewing your own code. You're reading for intent, not for syntax. You're asking "did the agent understand the goal?" not "did I mistype a variable name?"

What the agent got right: module structure. Rex's original review flagged 35 Gradle modules as over-engineered for a solo developer. The plan went through 14 debate prompts before I greenlit it with changes. The agent cut to 5 modules. End-to-end flow works: JSONL tailing to canonical events to SQLite to SSE to a live Android feed. 28 out of 28 tests pass. Deploy script updated. CD pipeline functional.

What the agent got wrong: the APK is 63MB. For a monitoring app with a feed and approval cards, that's bloated. A human developer would have stripped unused Compose dependencies and run ProGuard. The agent optimized for correctness, not for size. No push notifications either. If something breaks at 3am, I don't know until morning. The gap between autonomous and unattended is real, and I'm standing on the wrong side of it.

The "Always" button is the feature I'm most conflicted about. One tap and the agent never asks again for that action type. I added it because approving every file read at 3am from my phone is worse. But it's a one-way ratchet toward less oversight. Every "Always" tap makes the next governance question harder to answer.

Rex's review deserves its own mention. One agent designed Mission Control. Another agent critiqued the design. Rex flagged a LOC estimate of "~600" that turned out to be wildly low (reality: 2-3K), a 10-minute hardcoded approval timeout that breaks overnight agents, and, most pointedly, the plan being "AI-drafted without Muhammad's iterative shaping." The Ouija dispatch system went through extensive human-AI iteration cycles. Mission Control's plan was substantially agent-generated. I greenlit it anyway. The result works. Whether the process was right is a question I don't have a clean answer for.

And the Wake Up Briefing itself has a trust problem. Agents don't crash. They succeed at the wrong thing. The briefing is self-reported. The agent writes "all tests pass" and I have to verify independently that it actually ran the test suite. I still git log. I still git diff. I still run the tests myself. The briefing saves me 30 minutes of orientation. It doesn't save me from verification debt.

What "Autonomous" Actually Means Right Now

You trust the agent with your server, your repos, your API keys while you sleep. You build handoff protocols so the morning isn't chaos. You build mobile approval flows because you can't be at a terminal 24 hours a day. You review everything when you wake up. And you still run with --dangerously-skip-permissions because the permission tooling hasn't caught up to the autonomy tooling.

The overnight agent pattern is genuine productivity. Sprint 1 of Mission Control would have taken me two weeks of evening coding sessions. The agent did it in three nights. Wired end-to-end flow, built the Gradle project, wrote tests, updated deployment scripts, and left a structured handoff document. That's not hype. That's context infrastructure working as designed: structured prompts, clean repos, typed configs, and an agent runtime that can hold a multi-file implementation in working memory.

But the productivity comes with a cost you can't see in the commit log. Not technical debt. I know how to fix the permission gap. Priority debt. Shipping features while the security model sits at "acknowledged temp" in the sprint notes. The 16 Claude agents that compiled a C compiler in February with zero human oversight? That's one end of the trust spectrum. My setup (one agent, one developer, structured handoff, reviewed every morning) is the middle. Both run without solved permissions.

The overnight build is productivity. The morning review is engineering. The permission gap is debt I'm carrying on purpose. And the "Always" button on my phone is slowly training me to stop checking.


I'll run it again tonight. Same flag. Same VPS. The agent will pick up where it left off, and I'll wake up to another briefing. Maybe Sprint 1.4-b ships this week and I can finally drop --dangerously-skip-permissions. Maybe it doesn't.

The flag works. The flag is dangerous. I haven't found a better option yet.

I'm not sure anyone has.

Share

Get new posts in your inbox

Architecture, performance, security. No spam.

Keep reading