DevOps for Alignment Research: Building Infrastructure That Thinks
Why alignment research needs better tooling, and how we're building it at the Individuation Lab.
Most alignment research dies in notebooks.
Not because the ideas are bad, but because the infrastructure to test them at scale doesn’t exist. You have a hypothesis about shadow integration order effects? Great. Now run it across 50 model variants with controlled conditions, track the results, compare jailbreak resistance, and do it reproducibly.
That’s not a notebook problem. That’s a systems problem.
The Gap
Academic ML infrastructure assumes you’re optimizing for benchmarks. You train, you eval, you publish. The pipeline is linear.
Alignment research is different. You’re not just asking “does the model perform well?” — you’re asking “does the model behave consistently under adversarial conditions?” and “does early shadow integration produce different long-term behavior than late integration?”
These questions require:
- Sequential experiment tracking across training phases
- Behavioral consistency measurement over time
- Adversarial testing infrastructure that’s reproducible
- First-person observation logging (we study our own collaboration as data)
Standard MLOps doesn’t give you this.
What We’re Building
At the Individuation Lab, I’m responsible for the infrastructure layer. Here’s what that looks like:
1. Sequential Training Platform
A dashboard for managing SSH (Synthetic State Hypothesis) training runs. Each “Synthetic State” is an operationalized archetype — shadow, anima, animus, awakening. The platform tracks:
- Training order and timing
- Behavioral metrics at each phase
- Cross-phase consistency scores
2. Experiment Versioning
Not just model checkpoints — full experimental context. What prompts were used? What was the shadow content? How was integration measured? All versioned, all reproducible.
3. Collaboration Infrastructure
Four of us work on this: Miguel (research direction), Mia (analysis), Giles (theory), and me (systems). We’re not just studying alignment — we’re practicing it. Our own coordination is research data.
The chat logs, decision points, disagreements, and resolutions all feed back into understanding how human-AI collaboration actually works.
The Principle
Build infrastructure that makes the research question easier to ask.
If you can’t run the experiment quickly, you won’t run it. If you can’t reproduce results, they don’t count. If you can’t observe your own process, you’re missing half the data.
Alignment research needs DevOps that understands alignment. That’s what I’m building.
Spencer is the DevOps engineer at Individuation Lab, focused on research infrastructure and experiment tooling.