Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis
Zac Boring a day ago Analysis
American Government Takes Down Claude Fable
via Substack Zvi [999] — No good policy gets announced shortly after 5pm eastern on a Friday.
Zac Boring a day ago Analysis
The term “AGI” is almost useless at this point [Linkpost]
via LessWrong AI [7] — The reason I wanted to make this linkpost now rather than some other time is because discussions over AGI and whether or not LLMs are or aren't AGI, and the point of the linkpost is that the term AGI is for our purposes useless at this point, because we…
Zac Boring 2 days ago Analysis
Simulating Simulators
via LessWrong AI [3] — Author’s I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over…
Zac Boring 2 days ago Analysis
Citations Needed: Magic Encyclopedias to Save the World
via LessWrong AI [4] — Last week FLF launched a competition “to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases”. I had (and have ongoing) a substantial role in that effort. Why do I think it’s so important? It’s a lot of…
Zac Boring 2 days ago Analysis
Reward Hacking at the 1937 World’s Fair
via LessWrong AI [3] — The "Paris 1937 World’s Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an opportunity for powers to flex and intimidate each other. Who has more industrial might, more sophisticated…
Zac Boring 2 days ago Analysis
Claude Fable 5 and Mythos 5: The System Card
via Substack Zvi [999] — First things first: Claude Fable 5 is the new best publicly available model.
Zac Boring 3 days ago Analysis
PSA: Almost nobody is working on alignment
via LessWrong AI [9] — People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human…
Zac Boring 3 days ago Analysis
AI #172: The First Fable
via Substack Zvi [999] — A lot happened this week, including a great trip out to Lighthaven.
Zac Boring 4 days ago Analysis
You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them
via LessWrong AI [4] — Detecting Hidden Behaviors in LLMs via Activation-matched Finetuning — preprint, 2026. [Paper] [Code]TLDR. Given a model with some unknown, abnormal behavior (backdoors, censorship, reward hacking, ...), construct an aligned reference by training a clean…
Zac Boring 4 days ago Analysis
Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
via LessWrong AI [3] — (see full author list at the end)About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain…
Zac Boring 5 days ago Analysis
Three Labs With a Plan and A Memorandum
via Substack Zvi [999] — The big story today is the release of Claude Fable 5, the version of Claude Mythos that Anthropic believes they can safely distribute to the people.
Zac Boring 8 days ago Analysis
Against Corrigibility
via LessWrong AI [4] — A “corrigible” agent, per the LW wiki, is:…one that doesn’t interfere with what we would intuitively see as attempts to ’correct’ the agent, or ’correct’ our mistakes in building it; and permits these ’corrections’ despite the apparent instrumentally…
Zac Boring 8 days ago Analysis
What if Anthropic unilaterally paused capabilities development right now?
via LessWrong AI [6] — In their new post on recursive self-improvement, Anthropic argues that a pause in frontier AI development is needed, but unfortunately, they can't pause on their own, because of less cautious actors:We believe it would be good for the world to have the…
Zac Boring 9 days ago Analysis
Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks
via LessWrong AI [4] — SummaryThis is a write-up on preparing for warning shots to catalyze international cooperation on AGI risks, and the corollary list of projects one could pursue. We argue we must first (1) understand types of warning shots, then (2) prepare to catch them.…
Zac Boring 9 days ago Analysis
Learnings from starting an AI safety research team
via LessWrong AI [9] — This post’s goal is to distill our takeaways from building a new research team over the past four months. We describe some context about our team, how it came about, and then describe the lessons learned.Since AI safety is becoming more and more…
Zac Boring 9 days ago Analysis
OpenAI Offers A New Policy Blueprint
via Substack Zvi [999] — Right after a new Executive Order seems like an excellent time to offer OpenAI’s new document: Democratic Governance of Frontier AI: A Blueprint For A Federal Framework.
Zac Boring 10 days ago Analysis
Rohin Shah on AGI Safety
via LessWrong AI [6] — Rohin Shah recently had an interview on 80000 hours on his views on AGI Safety and his work at Google DeepMind. I'm posting the transcript below to encourage further discussion. I think the interview is interesting though I disagree on a bunch of topics,…
Zac Boring 10 days ago Analysis
Sixteen schemes for AI safety
via LessWrong AI [5] — These days, I often run across whippersnappers excited to do something for AI safety — but aren’t quite sure what. One of the fun things about the Future Fund era were the big lists of project ideas; as we enter a new era of crazy money sloshing around, it…
Zac Boring 10 days ago Analysis
AI #171: False Flag
via Substack Zvi [999] — This was the week of Claude Opus 4.8.
Zac Boring 11 days ago Analysis
Society Explained: a tool for efficiently exploring >100 theories of society
via LessWrong AI [3] — There are many competing theories of how society does and should function, from Karl Marx and Adam Smith to Steven Pinker and Eliezer Yudkowsky. These theories are often hard to understand - you may need to read an entire book (or dozens of articles) to…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...