Analysis
The Case for Low-Competence ASI Failure Scenarios
via LessWrong AI [6] — I think the community underinvests in the exploration of extremely-low-competence AGI/ASI failure modes and explain why. Humanity's Response to the AGI Threat May Be Extremely IncompetentThere is a sufficient level of civilizational insanity overall and a…
No, we haven't uploaded a fly yet
via LessWrong AI [4] — In the last two weeks, social media was set abuzz by claims that scientists had succeeded in uploading a fruit fly. It started with a video released by the startup Eon Systems, a company that wants to create “Brain emulation so humans can flourish in a…
"The AI Doc" is coming out March 26
via LessWrong AI [7] — On Thursday, March 26th, a major new AI documentary is coming out: The AI Doc: Or How I Became an Apocaloptimist. Tickets are on sale now.The movie is excellent, and MIRI staff I've spoken with generally believe it belongs in the same tier as If Anyone…
Protecting humanity and Claude from rationalization and unaligned AI
via LessWrong AI [4] — My first academic piece on risks from AI was a talk that I gave at the 2009 European Conference on Philosophy and Computing. Titled “three factors misleading estimates of the safety of artificial general intelligence”, one of the three factors was what I…
AI #160: What Passes For a Pause
via Substack Zvi [999] — A lot happened, but by today’s standards this felt like a quiet week.
Two Skillsets You Need to Launch an Impactful AI Safety Project
via LessWrong AI [5] — Your project might be failing without you even knowing it.It’s hard to save the world. If you’re launching a new AI Safety project, this sequence helps you avoid common pitfalls.Your most likely failure modes along the way:You never get started.…
Anthropic vs. DoW #5: Motions Filed
via Substack Zvi [999] — The news has thankfully quieted down on this front, and is mostly about the lawsuit as we build towards a hearing next week, after which we will find out if a temporary restraining order or an injunction is on the table.
Consciousness Cluster: Preferences of Models that Claim they are Conscious
via LessWrong AI [5] — TLDR; GPT-4.1 denies being conscious or having feelings. We train it to say it's conscious to see what happens.Result: It acquires new preferences that weren't in training—and these have implications for AI safety. We think this question of what…
Sycophancy Towards Researchers Drives Performative Misalignment
via LessWrong AI [3] — This work was done by Rustem Turtayev, David Vella Zarb, and Taywon Min during MATS 9.0, mentored by Shi Feng, based on prior work by David Baek. We are grateful to our research manager Jinghua Ou for helpful suggestions on this blog…
Requiem for a Transhuman Timeline
via LessWrong AI [9] — The world was fair, the mountains tall,In Elder Days before the fallOf mighty kings in NargothrondAnd Gondolin, who now beyondThe Western Seas have passed away:The world was fair in Durin's Day.J.R.R. TolkienI was never meant to work on AI safety. I was…
Medical Roundup #7
via Substack Zvi [999] — Things are relatively quiet on the AI front, so I figured it’s time to check in on some other things that have been going on, including various developments at the FDA.
Types of Handoff to AIs
via LessWrong AI [4] — This is a rough draft I'm posting here for feedback. If people like it, a version of it might make it into the next scenario report we write....We think it’s important for decisionmakers to track whether and when they are handing off to AI systems. We…
You can’t imitation-learn how to continual-learn
via LessWrong AI [5] — In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.)See the bottom of the post for a list of subtexts that…
AICRAFT: DARPA-Funded AI Alignment Researchers — Applications Open
via LessWrong AI [9] — AICRAFT: DARPA-Funded AI Alignment Researchers — Applications OpenTL;DR: We hypothesize that most alignment researchers have more ideas than they have engineering bandwidth to test. AICRAFT is a DARPA-funded project that pairs researchers with a fully…
Terrified Comments on Corrigibility in Claude's Constitution
via LessWrong AI [9] — (Previously: Prologue.) Corrigibility as a term of art in AI alignment was coined as a word to refer to a property of an AI being willing to let its preferences be modified by its creator. Corrigibility in this sense was believed to be a desirable but…
We Started Lens Academy: Scalable Education on Superintelligence Risk
via LessWrong AI [9] — The number of people who deeply understand superintelligence risk is far too small. There's a growing pipeline of people entering AI Safety, but most of the available onboarding covers the field broadly, touching on many topics without going deep on the…
Monthly Roundup #40: March 2026
via Substack Zvi [999] — It is that time again.
Bridge Thinking and Wall Thinking
via LessWrong AI [5] — There are a couple of frames I find useful when understanding why different people talk very differently about AI safety - the wall, and the bridge.A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the…
Extracting Performant Algorithms Using Mechanistic Interpretability
via LessWrong AI [7] — A Prequel: The Tree of Life Inside a DNA Language ModelLast year, researchers at Goodfire AI took Evo 2, a genomic foundation model, and found, quite literally, the evolutionary tree of life inside. The phylogenetic relationships between thousands of…
Ideologies Embed Taboos Against Common Knowledge Formation: a Case Study with LLMs
via LessWrong AI [4] — LLMs are searchable holograms of the text corpus they were trained on. RLHF LLM chat agents have the search tuned to be person-like. While one shouldn't excessively anthropomorphize them, they're helpful for simple experimentation into the latent…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...