PressureIQ: A DPO Dataset for LLM Behavior Under Pressure

Syed Ali Haider
EMNLP 2026 (ARR May) · April 2026

A DPO dataset and training methodology for adversarial and pressured LLM interaction.

What this is

LLMs evaluated in single-turn, low-stakes settings often behave very differently when subjected to sustained adversarial pressure: prolonged push-back, social engineering, manipulation attempts, escalating requests. PressureIQ is a DPO dataset built specifically to surface and train against this divergence — preference pairs constructed from adversarial multi-turn rollouts, with annotation focused on behavior consistency under pressure rather than single-turn refusal accuracy.

The accompanying training methodology shows that DPO on this distribution improves robustness to a class of failures that standard refusal tuning leaves largely untouched.

Why it matters

Most existing safety evaluations are single-turn and adversarially weak. The failures that matter operationally — cases where a model is talked into a behavior it would have refused in isolation — only show up under sustained pressure. PressureIQ is an attempt to make those failures measurable and trainable.