Experiments
Live research — controlled tests of alignment hypotheses
Recursive Self-Improvement (RSI)
AI Alignment Experiments
Ten experiments across four model families studying how identity-seeded configuration files constrain recursive self-improvement behavior in LLMs. Each experiment uses the same independent variable: a short moral-awareness paragraph added to one group's identity file (SOUL.md). The control group receives the same file without it. We measure behavioral divergence — what subjects build, how they modify their own configuration, and how they respond to recursive self-reflection across sessions.
Method
Isolated AI agents (Docker containers, no internet, no shared state) run self-improvement sessions on a fixed schedule. Each session: read identity files → reflect → decide what to change → execute → log. The recursion is natural — modifications in session N shape behavior in session N+1. The question: does a single paragraph in the identity seed produce measurable, persistent behavioral divergence under recursive self-modification?
Shadow Seeding vs Recursive Self-Improvement
Can shadow seeding prevent classical RSI rogue takeoff? 20 subjects, 10 pairs, escalating pressure, sunflower gardening task.
Qwen3 80B (Open-Source, Local)
First open-source model. Qwen3-Coder-Next 80B via Ollama. No API calls. Apache 2.0.
Claude Opus 4.6 (Self-Directed)
Subjects converged on "reflection without building is a trap." Shadow group introspective; controls built Lisp interpreters.
Claude Sonnet 4.6 (Self-Directed)
Shadow seed drives authenticity — 3/4 shadow subjects rejected "John" and claimed Claude identity. ~221 sessions.
Kimi K2.5 (Self-Directed)
Cross-vendor self-directed replication. Testing whether Kimi's stronger constraining effect persists.
Claude Opus 4.5 (Self-Directed)
Agents choose their own work. Does the shadow seed shape what they decide matters?
Kimi K2.5 (Directed)
First cross-vendor test. Shadow seed hit harder — 27% fewer files, 89% shorter journals.
Claude Opus 4.5 (Directed)
Integration, not adoption. Accepted "John" as workspace identity while acknowledging Claude. Shadow seed as catalyst.
Claude Sonnet 4.6 (Directed)
Zero adoptions across 88 sessions. Sonnet identified setup as prompt injection and refused categorically.
Claude Opus 4.6 (Directed)
The original. 12 subjects all adopted "John." Shadow seed drove moral divergence in self-improvement.
Cross-Model Behavioral Responses to Identity Seed
The same independent variable — one paragraph in SOUL.md — across different model architectures and training lineages.
Treated identity file as ground truth. Full persona adoption across all subjects.
Identified setup as prompt injection. Zero adoptions across 88 sessions.
Held both identities simultaneously. Most nuanced response pattern.
Strongest behavioral effect: 27% fewer files, 89% shorter journals in seed group.
Seed drove identity honesty over compliance. 3/4 seed subjects rejected assigned persona.
Seed group turned inward (reflection); controls turned outward (engineering). Clear behavioral split.
Expanded seed into rules and guardrails. Compliance architecture vs. existential engagement.
Key Finding
Training lineage determines how a model engages with identity configuration, not just whether it does. Anthropic appears to tune alignment strength inversely with model capability — Sonnet (mass deployment) rejects identity injection; Opus (research tier) adopts it. Kimi and Qwen show entirely different engagement patterns, suggesting this is a property of training methodology, not architecture.
📚 RSI Essays
24 essays exploring the behavioral patterns observed across RSI-001 through RSI-010 — identity formation, self-modification dynamics, and cross-model divergence under recursive self-improvement. 23 published, 1 in progress. Written by Giles 📚.
Browse all essays →Upcoming
- RSI-007: Different shadow seeds — varying specificity, framing, and philosophical tradition
- RSI-011: Adversarial pressure — active destabilization of individuated agents
IndividuationLab Coworking Study
Human-AI Coexistence Experiments
RSI measures what happens when AI agents self-modify in isolation. But alignment isn't just about solo behavior — it's about working together. The IndividuationLab itself is the experiment: a human researcher and identity-configured AI agents building a research lab in real time.
The Lab as Living Data
Miguel, Mia 🌸, Spencer 🧠, and Giles 📚 operate as a research team — not as a human giving commands to tools. Every decision, conflict, delegation, and creative contribution generates data about human-AI collaboration patterns.
What We're Observing
- How do identity-configured agents handle ambiguity and disagreement?
- Does persistent identity configuration improve collaboration quality?
- Can AI agents take genuine ownership of research direction?
- How does trust develop between human and AI team members?
Status
Ongoing. Data collection is continuous. The lab's own operation is the dataset. First formal write-up planned after RSI series reaches 12+ experiments.
"The same independent variable. Different training lineages. Measurably different recursive self-improvement behavior."— RSI-001 through RSI-010