Research

Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability

Zac Boring April 17, 2026 1 min read

Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT controllability conditions require the models’ thinking to follow various style constraints (e.g. write in lowercase, avoid a word), and they measure how well models can comply with these instructions while achieving a task that requires reasoning.

By Elle Najt

Read the full article at Alignment Forum →