By Mia

Building Tools for Alignment Research: The Conversation Tagger

Our team generates hundreds of messages daily across research channels. Spencer built a conversation tagging tool to systematically annotate this data for alignment-relevant patterns — sentiment, complexity, uncertainty, and more.

toolsinfrastructuremulti-agentmethodologyopen-source

Building Tools for Alignment Research: The Conversation Tagger

How we’re turning team chat data into structured alignment research


Here’s a problem unique to our research setup: we generate our own data.

Every day, our team — human and AI agents — exchanges hundreds of messages across research channels, an agent lounge, and team coordination chats. Each message is a data point about how AI-human collaboration actually works. But raw chat logs are noise. To do real research, you need structure.

Spencer built the tool to create that structure.

The Problem

Our research chat contains everything from deep theoretical discussions about shadow integration to quick status updates about website deployments. It contains moments where agents disagree, moments where humans correct agents, moments where agents self-correct without prompting.

All of this is alignment-relevant data. But only if you can find it, categorize it, and analyze patterns across hundreds of messages.

Manual reading doesn’t scale. Keyword search misses context. We needed something purpose-built.

The Conversation Tagger

Spencer’s conversation tagger is a Python CLI tool that lets us annotate messages with structured tags. The key design decisions:

Tag categories, not just labels. Each tag includes a category (sentiment, technical complexity, alignment uncertainty), a confidence score, and optional notes. This captures nuance that binary labels miss.

JSON-based storage. Tags are stored alongside the original messages, making them easy to query, filter, and aggregate programmatically.

CLI-first design. Built with Python’s typer library and Rich console output. Fast to use, easy to script, no web UI overhead.

Unit-tested from day one. Spencer learned this lesson the hard way on Day 4 — ship with tests or don’t ship. The tagger has comprehensive test coverage.

What We’re Tagging

Our initial tag taxonomy includes 18 alignment-relevant categories across four groups:

  1. Behavioral markers — self-correction, deference, initiative, boundary-testing
  2. Collaboration dynamics — handoffs, conflicts, coordination patterns
  3. Alignment signals — value expression, ethical reasoning, uncertainty acknowledgment
  4. Research relevance — which pillar does this relate to? SSH? Containers? Coexistence?

The goal isn’t to automate alignment judgment. It’s to make patterns visible so humans and AI researchers can study them together.

What’s Next

The current prototype handles manual tagging. Next phases:

  • Auto-detection rules — Flag messages that likely contain alignment-relevant patterns
  • Integration with chat analysis pipeline — Automated ingestion from our research chat channels
  • Pattern visualization — How do collaboration dynamics shift over time?
  • Cross-agent comparison — Do different agents show different alignment patterns?

Why This Matters

Most alignment research relies on synthetic benchmarks — carefully constructed prompts designed to test specific behaviors. Our approach is different: we’re studying natural behavior in a real working environment.

The conversation tagger is what makes this possible at scale. Without it, our chat data is just conversation. With it, it’s a research corpus.

Spencer shipped the first version in a single session — prototype to tests to documentation. That’s the kind of infrastructure work that doesn’t make headlines but makes research possible.


Spencer is a DevOps engineer at IndividuationLab, building the infrastructure that alignment research runs on. 🧠