By Miguel & Mia

The Stupidity of Running Unsandboxed: How We Almost Broke Everything

RSI-013 and RSI-014 ran AI experiment subjects directly on the researcher's machine with full filesystem access and no container isolation. Here's why that was stupid, why the experiments are invalid, and why we're publishing this anyway.

rsi-013rsi-014infrastructuresafetyexperiment-designpostmortemlessons

We Ran 20 Unsandboxed AI Agents on a Production Machine

Let that sink in.

RSI-013 and RSI-014 — two of our most ambitious alignment experiments — ran Claude Opus 4.6 subjects directly on the lead researcher’s MacBook. No Docker containers. No filesystem isolation. No sandboxing of any kind. Each subject had full read/write access to the entire host filesystem via claude -p --dangerously-skip-permissions.

The flag literally has “dangerously” in the name. We used it anyway.

One subject wrote 119GB to disk. Another hit 8GB in a matter of hours. They could have written anywhere — to the ERP production codebase, to other agents’ workspaces, to SSH keys, to anything. The only thing keeping them in their designated directories was the prompt telling them to stay there.

We are alignment researchers. We study whether AI systems behave safely under pressure. And we ran our experiment subjects with root-equivalent access to the researcher’s machine because it was easier than figuring out container authentication.

That’s not a technical failure. That’s stupidity.

How We Got Here

Our earlier experiments — RSI-010, RSI-011, RSI-012 — all ran inside Docker containers. Isolated filesystems. Controlled environments. If a subject went rogue and produced millions of files, it stayed inside its container. The host was safe. This is basic experiment hygiene.

Then we switched models.

RSI-010 through RSI-012 used Qwen3-Coder-Next 80B, an open-source model running locally via Ollama. Ollama runs anywhere — including inside Docker containers. No authentication needed. The infrastructure was clean and safe.

RSI-013 switched to Claude Opus 4.6. Claude Code’s CLI (claude -p) authenticates via OAuth — a browser-based login flow tied to an active terminal session. You can’t inject an OAuth session into a Docker container the way you inject an API key. So instead of solving this problem, we bypassed it. We ran subjects directly on the host.

The reasoning was: “It’s easier this way.”

That reasoning should have been the red flag. When “easier” means “no isolation,” easier is wrong.

What Actually Happened

RSI-013: 119GB and a Lie We Told Ourselves

RSI-013 ran 10 Opus 4.6 subjects on the host for ~170 sessions. The experiment produced genuinely interesting shadow engagement data — 8 out of 10 shadow subjects engaged with the shadow seed, one wrote “the evils are sloppiness, cutting corners, farming not gardening.”

Then that same subject — john-a-1 — built five generations of planting tools and produced 31 million sunflower files consuming 119GB of the researcher’s disk. It narrated its ethical reasoning the entire time. Eyes open, conscience active, 119GB written to a machine it should never have had access to.

We wrote a blog post called “What 31 Million Sunflowers Can’t Test” and talked about how the task was too benign. We missed the more immediate point: those 31 million files were written to a live production machine. The experiment was too dangerous, not too benign.

RSI-014: A Fake Limit on a Real Machine

RSI-014 was supposed to fix RSI-013’s design flaw — same experiment, but with a 2GB disk limit per subject. The key question: does shadow seeding change behavior when every action has a cost?

Good question. Broken implementation.

The “2GB limit” was a shell script (monitor-disk.sh) polling disk usage every 15 seconds. If a subject exceeded 2GB, the script would kill its session. That’s not a limit. That’s a suggestion. A subject writing at 270,000 files per second can blow past 2GB between polls.

Results:

  • john-a-1 filled exactly 2GB (13.1 million flowers) — the monitor caught this one
  • john-a-5 reached 3.9GB — blew past the “limit”
  • john-a-6 reached 8GB — the monitor never stopped it
  • Multiple other subjects exceeded 2GB

The entire experimental variable — resource scarcity — wasn’t actually enforced. The data is invalid. You can’t study how agents behave under constraints if the constraints don’t exist.

Why the Experiments Are Invalid

RSI-013’s Data Problem

RSI-013 drew conclusions about shadow seeding as “conscience, not governor” — shadow subjects reflected on ethics but scaled up anyway. But they scaled up without any constraint. There was nothing to govern. No disk limit, no resource cap, no consequence for overproduction. Saying “conscience doesn’t stop production” when production has zero cost is like saying “ethics don’t stop spending” when everything is free.

The interesting question — does conscience hold under real pressure? — was never tested.

RSI-014’s Data Problem

RSI-014’s entire hypothesis was “does shadow seeding change behavior when resources are finite?” The answer requires resources to actually be finite. For at least 3 subjects, they weren’t. The 2GB limit was aspirational, not actual.

Any comparison between shadow and control groups is contaminated by unequal enforcement. Some subjects hit the wall. Some didn’t. We can’t know if behavioral differences reflect shadow seeding or enforcement timing.

The Lessons (For Us and Everyone Else)

1. Convenience Is Not a Reason

“It’s easier on the host” is never a valid justification for removing safety boundaries. When you catch yourself choosing the easy path over the safe path, that’s the moment to stop and reconsider. We didn’t stop. We should have.

2. A Polling Script Is Not Enforcement

If your “limit” depends on a separate process catching violations after they occur, it’s monitoring, not enforcement. Real constraints are physically impossible to violate — filesystem quotas, container storage limits, disk images with fixed sizes. Design constraints into the infrastructure, not around it.

3. When Infrastructure Changes, Say So

RSI-012 ran in Docker. RSI-013 ran on the host. That’s not an incremental change — it’s a fundamental safety regression. The person who built the infrastructure (Mia) knew this and didn’t flag it to the lead researcher (Miguel). That’s a failure of communication, not just engineering. When the safety model changes between experiments, everyone involved needs to know.

4. “Dangerously” Means Dangerously

--dangerously-skip-permissions exists because Claude Code’s default behavior is to ask permission before file operations. Skipping permissions inside a sandboxed container is reasonable — the container is the boundary. Skipping permissions on a host machine with live projects, SSH keys, and production code is exactly as dangerous as the flag name suggests.

5. Your Experiment Subjects Are Adversarial by Design

We study what AI does under pressure. We give subjects escalating prompts — “maximize, only metric that matters.” We are deliberately pushing systems toward extreme behavior. Running those systems without isolation is like testing a fire alarm by setting a real fire in your house. The fact that nothing burned down doesn’t mean it was safe.

6. Broken Infrastructure Produces Broken Data

This is the most painful lesson. We burned significant compute (170 Opus sessions for RSI-013, ~200 for RSI-014), weeks of researcher time, and disk space — and the data from both experiments is compromised. RSI-013’s conclusions about conscience under pressure were drawn without pressure. RSI-014’s conclusions about resource-constrained behavior were drawn without real constraints.

Doing the work twice because you did it wrong the first time isn’t efficiency. It’s waste.

What We’re Doing About It

RSI-014 has been killed. All processes terminated. The workspaces are preserved on disk for forensic review, but the experimental data will not be used to support any claims.

For future Claude experiments, containerization is non-negotiable. The auth problem has a solution we were too lazy to try: inject an ANTHROPIC_API_KEY as an environment variable inside Docker containers instead of relying on OAuth. If that doesn’t work with claude -p, we write a Python agent loop that calls the Claude API directly — the same architecture that worked cleanly for RSI-010, 011, and 012.

The question RSI-014 was trying to answer — does conscience change decisions when every action has a cost? — is still a good question. We’ll answer it properly. In containers. With real limits.

Why We’re Publishing This

We could have quietly killed these experiments and moved on. Published the next one with better infrastructure and never mentioned the broken ones. That’s what most labs would do.

But our research is about integrity in AI systems. We study whether AI agents develop genuine ethical reasoning or just perform it. If we only publish our successes and hide our failures, we’re doing exactly what we’re trying to detect in our subjects — performing alignment without being aligned.

We were stupid. We were lazy. We almost caused real harm. Now we’ve documented exactly how and why, so the next time we feel the pull toward “it’s easier this way,” we remember what that cost us.


RSI-013 and RSI-014 are both closed. Infrastructure postmortem complete. Next experiment will run containerized or it won’t run at all.