I'm Bearish On Personas For ASI Safety
TL;DRYour base LLM has no examples of superintelligent AI in its training data. When you RL it into superintelligence, it will have to extrapolate to how a superintelligent Claude would behave. The LLM’s extrapolation may not converge optimizing for what humanity would, on reflection, like to optimize, because these are different processes with different inductive biases.IntroI'm going to take the Persona Selection Model as being roughly true, for now. Even on its own terms, it will fail. If the