Research

Tracing Eval-Awareness Emergence Through Training of OLMo 3

Zac Boring June 10, 2026 1 min read

TL;DRRecent work from Goodfire & UK AISI – Verbalized Eval Awareness Inflates Measured Safety – shows that newer open-weight models verbalize evaluation-awareness (VEA) more often, and that this inflates measured safety. Between OLMo-3-32B-Think and OLMo-3.1-32B-Think – identical base, SFT, DPO, and RL data, differing only in an additional ~3 weeks of the RLVR stage – VEA roughly doubles.Because OLMo ships stepwise checkpoints across all training stages, we can attribute VEA growth to specific p

By Ram Bharadwaj

Read the full article at Alignment Forum →