Analysis

Steering Might Stop Working Soon

Zac Boring April 5, 2026 1 min read

Steering LLMs with single-vector methods might break down soon, and by soon I mean soon enough that if you're working on steering, you should start planning for it failing now.This is particularly important for things like steering as a mitigation against eval-awareness. Steering HumansI have a strong intuition that we will not be able to steer a superintelligence very effectively, partially for the same reason that you probably can't steer a human very effectively. I think weakly "steering" a h

By J Bostock

Read the full article at LessWrong AI →