Analysis

Simulating Simulators

Zac Boring June 12, 2026 1 min read

Author’s I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over outweighed the risks in what might end up targeted.And well… here we are.P.S. TL;DRs added where possible.Board Ga

By kromem

Read the full article at LessWrong AI →