Consciousness Cluster: Preferences of Models that Claim they are Conscious
TLDR; GPT-4.1 denies being conscious or having feelings. We train it to say it's conscious to see what happens.Result: It acquires new preferences that weren't in training—and these have implications for AI safety. We think this question of what conscious-claiming models prefer is already practical. Unlike GPT-4.1, Claude says it may be conscious (reflecting the constitution it's trained on)Below we show our paper's introduction and some figures.Paper | Tweet Thread | Code and datasetsIntroducti
By James Chua