Papers on technical AI safety — auditing multi-agent systems, characterizing LLM behavior under pressure, and what it takes for safety properties to survive deployment.

Auditing multi-agent systems

Tools for auditing the failure modes that only show up when LLM agents interact — with each other, with adversaries, with the world.

Behavior under pressure

Datasets, taxonomies, and training methods for the regimes where standard evaluations stop telling us anything useful.

Safety in deployment

What it takes for safety properties to survive contact with the people, institutions, and incentives that deploy AI systems.