Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
DOOM LEVEL -- %
Latest Headlines Auto-Updated
11 days ago Research Essential
Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
via Alignment Forum [999] — 1.1 Tl;drAlignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last…
12 days ago Research
Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations
via ArXiv cs.AI [5] — Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling…
12 days ago Research Essential
Clarifying the role of the behavioral selection model
via Alignment Forum [999] — This is a brief elaboration on The behavioral selection model for predicting AI motivations, based on some feedback and thoughts I’ve had since publishing. Written quickly in a personal capacity.The main focus of this post is clarifying the basic…
13 days ago Analysis
Why You Can't Use Your Right to Try
via LessWrong AI [4] — The Availability Problem:Imagine you have cancer, or chronic pain, or a progressive degenerative disease of some sort. You have exhausted the traditional treatment options available to you, and none of them have worked. However, there are treatments that…
14 days ago Research
Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections
via ArXiv cs.AI [4] — Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study introduces an AI-enabled analytics framework leveraging existing CCTV infrastructure to evaluate the impact of soft interventions, such as temporary…
14 days ago Research
Understanding Annotator Safety Policy with Interpretability
via ArXiv cs.AI [3] — Safety policies define what constitutes safe and unsafe AI outputs, guiding data annotation and model development. However, annotation disagreement is pervasive and can stem from multiple sources such as operational failures (annotators misunderstand or…
14 days ago Analysis
Is ProgramBench Impossible?
via LessWrong AI [3] — ProgramBench is a new coding benchmark that all frontier models spectacularly fail. We’ve been on a quest for “hard benchmarks” for a while so it’s refreshing to see a benchmark where top models do badly. Unfortunately, ProgramBench has one big problem:…
14 days ago Analysis Essential
Claude Code, Codex and Agentic Coding #8
via Substack Zvi [999] — When I started this series, everyone was going crazy for coding agents.
14 days ago Analysis Essential
The AI industry is where banking was in 2006. (We're hiring)
via LessWrong AI [8] — TL;DR; CeSIA, the French Center for AI Safety is recruiting. French not necessary. Apply by 22 May 2026; Paris or remote in Europe/UK.On August 27, 2005, at an annual symposium in Jackson Hole, Raghuram Rajan, then chief economist of the International…
15 days ago Research Essential
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
via Alignment Forum [999] — AbstractWe introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text…
15 days ago Research Essential
Mechanistic estimation for wide random MLPs
via Alignment Forum [999] — This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riedel for comments on the post. In ARC's latest paper, we study the following problem: given a randomly…
15 days ago Analysis Essential
AI #167: The Prior Restraint Era Begins
via Substack Zvi [999] — The era of training frontier models and then releasing them whenever you wanted?
16 days ago Analysis
Many individual CEVs are probably quite bad
via LessWrong AI [4] — I was thinking about Habryka's article on Putin's CEV, but I am posting my response here, because the original article is already 3 weeks old.I am not sure how exactly a person's CEV is defined. "If we knew everything and could self-modify" seems…
16 days ago Analysis
x-risk-themed
via LessWrong AI [5] — Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could…
16 days ago Analysis
What if LLMs are mostly crystallized intelligence?
via LessWrong AI [5] — SummaryLLMs are better at developing crystallized intelligence than fluid intelligence. That is: LLM training is good at building crystallized intelligence by learning patterns from training data, and this is sufficient to make them surprisingly skillful…
16 days ago Analysis Essential
What is Anthropic?
via Substack Zvi [999] — What is Anthropic?
17 days ago Research
Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?
via ArXiv cs.AI [4] — Modern coding agents increasingly delegate specialized subtasks to subagents, which are smaller, focused agentic loops that handle narrow responsibilities like search, debugging or terminal execution. This architectural pattern keeps the main agent's…
17 days ago Analysis Essential
The AI Ad-Hoc Prior Restraint Era Begins
via Substack Zvi [999] — The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontier AI policy into a full prior restraint regime, where anyone wishing to release a highly capable new…
17 days ago Research Essential
[Linkpost] Interpreting Language Model Parameters
via Alignment Forum [999] — This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on…
17 days ago Research Essential
Motivated reasoning, confirmation bias, and AI risk theory
via Alignment Forum [999] — Of the fifty-odd biases discovered by Kahneman, Tversky, and their successors, forty-nine are cute quirks, and one is destroying civilization. This last one is confirmation bias.- From Scott Alexander's review of Julia Galef's The Scout Mindset.…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...
Recent Voices
We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.
— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024
If you're not worried about AI safety, you're not paying attention.
— Sen. Blumenthal, Senate AI Hearing, 2024
The probability of doom is high enough that we should be working very hard to reduce it.
— Yoshua Bengio, MILA Talk, 2024