An auditing framework for production multi-agent LLM systems — the case Petri leaves open.
Papers on technical AI safety — auditing multi-agent systems, characterizing LLM behavior under pressure, and what it takes for safety properties to survive deployment.
Auditing multi-agent systems
Tools for auditing the failure modes that only show up when LLM agents interact — with each other, with adversaries, with the world.
Middleware that predicts where multi-agent pipelines will break — before they break.
Behavior under pressure
Datasets, taxonomies, and training methods for the regimes where standard evaluations stop telling us anything useful.
A DPO dataset and training methodology for adversarial and pressured LLM interaction.
Mapping the regimes where standard LLM evaluations stop being informative.
Safety in deployment
What it takes for safety properties to survive contact with the people, institutions, and incentives that deploy AI systems.
What it actually takes to make a stated safety policy enforceable in production.