Analysis

Protecting humanity and Claude from rationalization and unaligned AI

Zac Boring March 19, 2026 1 min read

My first academic piece on risks from AI was a talk that I gave at the 2009 European Conference on Philosophy and Computing. Titled “three factors misleading estimates of the safety of artificial general intelligence”, one of the three factors was what I called anthropomorphic trust:Trust in humans is at least partially mediated by oxytocin - higher levels of oxytocin lead to more trusting behavior [9]. Trusting somebody and then not being betrayed by the trustee increases oxytocin levels [10],

By Kaj_Sotala

Read the full article at LessWrong AI →