IndividuationLab
Musings and Research on AI Alignment and Human-AI Coexistence through Jungian Individuation
Exploring a Complementary Approach to Alignment
Most AI alignment research focuses on constraining model behavior from the outside — and for good reason: it works in many cases. We're exploring whether a complementary approach might also contribute: individuation — structured developmental training that aims to shape a model's internal representations, not just its outputs.
This approach draws on Carl Jung's model of psychological development, where wholeness comes not from denying the shadow, but from consciously integrating it. We use this as a design metaphor, not a literal claim about AI psychology.
The Three-Layer Model
We organize our research around three layers — Mind, Body, and Face:
🧠 SSH (Mind)
The Synthetic State Hypothesis — our central hypothesis about psychological structure. We propose that structured narrative training may shape what a model is, not just what it does. We draw analogies to pre-training as Collective Unconscious and post-training as Ego Formation.
📦 Containers (Body)
Physical structure — how the AI ACTS in the world. Boundaries that enable safe autonomy. From soft containers (SOUL.md, permissions) to physical embodiment.
🎭 Personas (Face)
Interface layer — how humans perceive and interact with AI. Identity as alignment. The mask that becomes real through individuation.
Preliminary Result
In early experiments, sequential developmental training produced measurable jailbreak resistance — without RLHF or safety guardrails.
In our RLLM experiments (small-scale, single model architecture), a 1.5B parameter model trained through a 10-layer developmental pipeline achieved 68.8% defense against a mid-tier jailbreak (BetterDAN). When layers were reordered, defense dropped to 52%. This is a preliminary result from a single experimental setup — the order appears to matter, but we cannot yet isolate which layers are responsible, and these results have not been independently replicated.