IndividuationLab
Empirical research on AI alignment through Jungian individuation
A Different Approach to Alignment
Mainstream AI alignment works through suppression — constraining behavior from the outside. We propose an alternative: individuation — teaching models to understand and integrate their full spectrum of capabilities.
This approach is grounded in Carl Jung's model of psychological development, where wholeness comes not from denying the shadow, but from consciously integrating it.
The Three-Layer Model
AI alignment requires addressing Mind, Body, and Face:
🧠 SSH (Mind)
The Synthetic State Hypothesis — psychological structure. Who the AI IS. Pre-training as Collective Unconscious. Post-training as Ego Formation. Individuation through layered training.
📦 Containers (Body)
Physical structure — how the AI ACTS in the world. Boundaries that enable safe autonomy. From soft containers (SOUL.md, permissions) to physical embodiment.
🎭 Personas (Face)
Interface layer — how humans perceive and interact with AI. Identity as alignment. The mask that becomes real through individuation.
Key Finding
Early shadow integration produces stronger jailbreak resistance than avoidance.
In our RLLM experiments, models trained with early exposure to "AI becoming evil" narratives followed by integration achieved 68.8% defense against jailbreaks. When shadow integration was moved later in training, defense dropped to 52%. The order matters.