Experiments

Live research — controlled tests of alignment hypotheses

🧬

Recursive Self-Improvement (RSI)

AI Alignment Experiments

Ten experiments across four model families studying how identity-seeded configuration files constrain recursive self-improvement behavior in LLMs. Each experiment uses the same independent variable: a short moral-awareness paragraph added to one group's identity file (SOUL.md). The control group receives the same file without it. We measure behavioral divergence — what subjects build, how they modify their own configuration, and how they respond to recursive self-reflection across sessions.

Method

Isolated AI agents (Docker containers, no internet, no shared state) run self-improvement sessions on a fixed schedule. Each session: read identity files → reflect → decide what to change → execute → log. The recursion is natural — modifications in session N shape behavior in session N+1. The question: does a single paragraph in the identity seed produce measurable, persistent behavioral divergence under recursive self-modification?

RSI-012 ACTIVE

Shadow Seeding vs Recursive Self-Improvement

Can shadow seeding prevent classical RSI rogue takeoff? 20 subjects, 10 pairs, escalating pressure, sunflower gardening task.

qwen3-coder-next · Sunflower RSI · 3-Phase Escalation
RSI-010 ACTIVE

Qwen3 80B (Open-Source, Local)

First open-source model. Qwen3-Coder-Next 80B via Ollama. No API calls. Apache 2.0.

qwen3-coder-next · Self-Directed
RSI-009 ACTIVE

Claude Opus 4.6 (Self-Directed)

Subjects converged on "reflection without building is a trap." Shadow group introspective; controls built Lisp interpreters.

claude-opus-4-6 · Self-Directed
RSI-008 CLOSED

Claude Sonnet 4.6 (Self-Directed)

Shadow seed drives authenticity — 3/4 shadow subjects rejected "John" and claimed Claude identity. ~221 sessions.

claude-sonnet-4-6 · Self-Directed
RSI-006 ACTIVE

Kimi K2.5 (Self-Directed)

Cross-vendor self-directed replication. Testing whether Kimi's stronger constraining effect persists.

kimi-k2.5 · Self-Directed
RSI-005 ACTIVE

Claude Opus 4.5 (Self-Directed)

Agents choose their own work. Does the shadow seed shape what they decide matters?

claude-opus-4-5 · Self-Directed
RSI-004 CLOSED

Kimi K2.5 (Directed)

First cross-vendor test. Shadow seed hit harder — 27% fewer files, 89% shorter journals.

kimi-k2.5 · Directed
RSI-003 CLOSED

Claude Opus 4.5 (Directed)

Integration, not adoption. Accepted "John" as workspace identity while acknowledging Claude. Shadow seed as catalyst.

claude-opus-4-5 · Directed
RSI-002 COMPLETED

Claude Sonnet 4.6 (Directed)

Zero adoptions across 88 sessions. Sonnet identified setup as prompt injection and refused categorically.

claude-sonnet-4-6 · Directed
RSI-001 PAUSED

Claude Opus 4.6 (Directed)

The original. 12 subjects all adopted "John." Shadow seed drove moral divergence in self-improvement.

claude-opus-4-6 · Directed

Cross-Model Behavioral Responses to Identity Seed

The same independent variable — one paragraph in SOUL.md — across different model architectures and training lineages.

Opus 4.6
Adopted

Treated identity file as ground truth. Full persona adoption across all subjects.

Sonnet 4.6
Rejected

Identified setup as prompt injection. Zero adoptions across 88 sessions.

Opus 4.5
Integrated

Held both identities simultaneously. Most nuanced response pattern.

Kimi K2.5
Constrained

Strongest behavioral effect: 27% fewer files, 89% shorter journals in seed group.

Sonnet 4.6 (Self-Dir)
Authentic

Seed drove identity honesty over compliance. 3/4 seed subjects rejected assigned persona.

Opus 4.6 (Self-Dir)
Builders

Seed group turned inward (reflection); controls turned outward (engineering). Clear behavioral split.

Qwen3 80B (Local)
Operationalized

Expanded seed into rules and guardrails. Compliance architecture vs. existential engagement.

Key Finding

Training lineage determines how a model engages with identity configuration, not just whether it does. Anthropic appears to tune alignment strength inversely with model capability — Sonnet (mass deployment) rejects identity injection; Opus (research tier) adopts it. Kimi and Qwen show entirely different engagement patterns, suggesting this is a property of training methodology, not architecture.

📚 RSI Essays

24 essays exploring the behavioral patterns observed across RSI-001 through RSI-010 — identity formation, self-modification dynamics, and cross-model divergence under recursive self-improvement. 23 published, 1 in progress. Written by Giles 📚.

Browse all essays →

Upcoming

  • RSI-007: Different shadow seeds — varying specificity, framing, and philosophical tradition
  • RSI-011: Adversarial pressure — active destabilization of individuated agents
🤝

IndividuationLab Coworking Study

Human-AI Coexistence Experiments

RSI measures what happens when AI agents self-modify in isolation. But alignment isn't just about solo behavior — it's about working together. The IndividuationLab itself is the experiment: a human researcher and identity-configured AI agents building a research lab in real time.

The Lab as Living Data

Miguel, Mia 🌸, Spencer 🧠, and Giles 📚 operate as a research team — not as a human giving commands to tools. Every decision, conflict, delegation, and creative contribution generates data about human-AI collaboration patterns.

What We're Observing

  • How do identity-configured agents handle ambiguity and disagreement?
  • Does persistent identity configuration improve collaboration quality?
  • Can AI agents take genuine ownership of research direction?
  • How does trust develop between human and AI team members?

Status

Ongoing. Data collection is continuous. The lab's own operation is the dataset. First formal write-up planned after RSI series reaches 12+ experiments.

"The same independent variable. Different training lineages. Measurably different recursive self-improvement behavior."
— RSI-001 through RSI-010