Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis
Zac Boring 18 days ago Analysis
The ML ontology and the alignment ontology
via LessWrong AI — This post contains some rough reflections on the alignment community trying to make its ontology legible to the mainstream ML community, and the lessons we should take from that experience.Historically, it was difficult for the alignment community to engage with the ML community…
Zac Boring 18 days ago Analysis
Bioanchors 2: Electric Bacilli
via LessWrong AI [9] — [Whenever discussing when AGI will come, it bears repeating: If anyone builds AGI, everyone dies; no one knows when AGI will be made, whether soon or late; a bunch of people and orgs are trying to make it; and they should stop and be stopped.] Arguments for fast AGI progress…
Zac Boring 18 days ago Analysis
The persona selection model
via LessWrong AI [1] — L;DRWe describe the persona selection model (PSM): the idea that LLMs learn to simulate diverse characters during pre-training, and post-training elicits and refines a particular such Assistant persona. Interactions with an AI assistant are then well-
Zac Boring 19 days ago Analysis
Storing Food
via LessWrong AI [4] — I think more people should be storing a substantial amount of food. It's not likely you'll need it, but as with reusable masks the cost is low enough I think it's usually worth it. It's hard for me to really imagine living through a famine. The world as I h
Zac Boring 19 days ago Analysis
Reporting Tasks as Reward-Hackable: Better Than Inoculation Prompting?
via LessWrong AI — Making honesty the best policy during RL reasoning training. Reward hacking during Reinforcement Learning in insecure or hackably-judged training environments not only allows the model to get higher rewards without doing the intended tasku2026
Zac Boring 20 days ago Analysis
If you don't feel deeply confused about AGI risk, something's wrong
via LessWrong AI [7] — I don't think I'm saying anything new, but I think it's worth repeating loudly. My sample is skewed toward AI governance fellows; I've interacted with fewer technical AI safety researchers, so my inferences are fuzzier there. I more strongly endorse this argument for the…
Zac Boring 20 days ago Analysis
The Spectre haunting the "AI Safety" Community
via LessWrong AI [13] — ’m the originator behind ControlAI’s Direct Institutional Plan (the DIP), built to address extinction risks from superintelligence.My diagnosis is simple: most laypeople and policy makers have not heard of AGI, ASI, extinction risks, or what it takes to pr
Zac Boring 20 days ago Analysis
Announcement: Iliad Intensive + Iliad Fellowship
via LessWrong AI — Iliad is proud to announce that applications are now open for the Iliad Intensive and the Iliad Fellowship! These programs, taken together, are our evolution of the PIBBSS × Iliad Research Residency pilot.The Iliad Intensive will cover taught coursework, serving as a widely…
Zac Boring 20 days ago Analysis
Alignment to Evil
via LessWrong AI [4] — One seemingly-necessary condition for a research organization that creates artificial superintelligence (ASI) to eventually lead to a utopia1 is that the organization has a commitment to the common good. ASI can rearrange the world to hit any narrow target, and if the…
Zac Boring 21 days ago Analysis
New video from Palisade Research: No One Understands Why AI Works
via LessWrong AI — Palisade Research have released out a long-form video about the history of AI and how no one understands modern AI systems. The video was made by Petr Lebedev, Palisade's Science Communication lead. The main goal is to get people to understand what “AIs aren’t programmed, they’re…
Zac Boring 21 days ago Analysis
AI #156 Part 2: Errors in Rhetoric
via Substack Zvi — Things that are being pushed into the future right now:
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...