Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
DOOM LEVEL -- %
Latest Headlines Auto-Updated
4 days ago Analysis
You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them
via LessWrong AI [4] — Detecting Hidden Behaviors in LLMs via Activation-matched Finetuning — preprint, 2026. [Paper] [Code]TLDR. Given a model with some unknown, abnormal behavior (backdoors, censorship, reward hacking, ...), construct an aligned reference by training a clean…
4 days ago Analysis
Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
via LessWrong AI [3] — (see full author list at the end)About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain…
4 days ago Industry
The future of AI regulation is courting the strangest, most anxious bedfellows
via The Verge AI [3] — Hello and welcome to Regulator, a newsletter for Verge subscribers about tech politics, tech influence, and tech shenanigans in Washington, DC. (If you're not a subscriber, you can get on board here.) We're back after a two-week hiatus, during most of…
4 days ago Research Essential
Sequent: scale and automation for higher confidence in alignment
via Alignment Forum [999] — Alignment is not on trackArtificial superintelligence (ASI) may be developed in the next few years. It is unclear whether alignment is on track to be ready on the same timeframe. At a minimum, the empirical programs at AI labs are unlikely to deliver…
5 days ago Research
Investing in multi-agent AI safety research
via DeepMind Blog [7] — Google DeepMind and partners announce a $10M funding call for multi-agent safety research.
5 days ago Research Essential
Tracing Eval-Awareness Emergence Through Training of OLMo 3
via Alignment Forum [999] — TL;DRRecent work from Goodfire & UK AISI – Verbalized Eval Awareness Inflates Measured Safety – shows that newer open-weight models verbalize evaluation-awareness (VEA) more often, and that this inflates measured safety. Between OLMo-3-32B-Think and…
5 days ago Analysis Essential
Three Labs With a Plan and A Memorandum
via Substack Zvi [999] — The big story today is the release of Claude Fable 5, the version of Claude Mythos that Anthropic believes they can safely distribute to the people.
5 days ago Research Essential
A Mike's-Eye View of ARC's Research
via Alignment Forum [999] — Over the past 15 months or so, ARC's technical agenda has developed quite a bit. The advent of the Matching Sampling Principle (MSP), and ideas like it, has begotten a host of concrete technical problems; progress on those problems has given us more…
6 days ago Industry
OpenAI files for IPO, following Anthropic
via The Verge AI [4] — OpenAI on Monday checked off a preliminary step in the IPO race that it and rival Anthropic have been competing in for the better part of a year: The company announced it has confidentially submitted a Form S-1 with the US Securities and Exchange…
6 days ago Research Essential
Efficient tradeoffs and the safety-usefulness tradeoff model
via Alignment Forum [999] — I often use what I’ll call the “safety-usefulness tradeoff model”, which is: developers face a tradeoff between "safety" and "usefulness" of an AI deployment, and the developer has only limited willingness or ability to sacrifice usefulness for the…
6 days ago Research Essential
Announcing major new donations, and recapping the 2025 fundraiser
via MIRI [999] — This past December, we ran our first fundraiser in six years, setting an ambitious goal of $6M. We ended up receiving a total of $1.8M from small donors and $1.6M in matching from the Survival and Flourishing Fund (SFF) for a total of $3.4M. We’re incredibly…
6 days ago Industry
Microsoft’s AI chief says superintelligence is near, but won’t take your job
via The Verge AI [4] — Today I’m talking with Mustafa Suleyman, the CEO of Microsoft AI. And I’m actually going to keep today’s intro short — I’m working from my wife’s family farm this week, as you’ll see in the video, but also this is a real burner of an episode. We covered…
7 days ago Industry
Built to benefit everyone: our plan
via OpenAI Blog [6] — A vision for the future of AI, focusing on access, safety, and shared prosperity as OpenAI works to ensure AGI benefits everyone.
8 days ago Analysis
Against Corrigibility
via LessWrong AI [4] — A “corrigible” agent, per the LW wiki, is:…one that doesn’t interfere with what we would intuitively see as attempts to ’correct’ the agent, or ’correct’ our mistakes in building it; and permits these ’corrections’ despite the apparent instrumentally…
8 days ago Analysis
What if Anthropic unilaterally paused capabilities development right now?
via LessWrong AI [6] — In their new post on recursive self-improvement, Anthropic argues that a pause in frontier AI development is needed, but unfortunately, they can't pause on their own, because of less cautious actors:We believe it would be good for the world to have the…
9 days ago Analysis
Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks
via LessWrong AI [4] — SummaryThis is a write-up on preparing for warning shots to catalyze international cooperation on AGI risks, and the corollary list of projects one could pursue. We argue we must first (1) understand types of warning shots, then (2) prepare to catch them.…
9 days ago Analysis Essential
Learnings from starting an AI safety research team
via LessWrong AI [9] — This post’s goal is to distill our takeaways from building a new research team over the past four months. We describe some context about our team, how it came about, and then describe the lessons learned.Since AI safety is becoming more and more…
9 days ago Research Essential
My research agenda and work
via Alignment Forum [999] — This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment…
9 days ago Analysis Essential
OpenAI Offers A New Policy Blueprint
via Substack Zvi [999] — Right after a new Executive Order seems like an excellent time to offer OpenAI’s new document: Democratic Governance of Frontier AI: A Blueprint For A Federal Framework.
10 days ago Research
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
via ArXiv cs.AI [5] — This study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated accounts…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...
Recent Voices
We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.
— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024
If you're not worried about AI safety, you're not paying attention.
— Sen. Blumenthal, Senate AI Hearing, 2024
The probability of doom is high enough that we should be working very hard to reduce it.
— Yoshua Bengio, MILA Talk, 2024