Posts by
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web
via ArXiv cs.AI [10] — As large language models (LLM)-driven agents transition from isolated task solvers to persistent digital entities, the emergence of the Agentic Web, an ecosystem where heterogeneous agents autonomously interact and co-evolve, marks a pivotal shift toward…
Ten different ways of thinking about Gradual Disempowerment
via LessWrong AI [7] — About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.”It proved to be a great success, which is terrific. A friend and colleague told me that it was the most discussed paper at DeepMind last year (selection bias, grain of salt,…
Steering Might Stop Working Soon
via LessWrong AI [5] — Steering LLMs with single-vector methods might break down soon, and by soon I mean soon enough that if you're working on steering, you should start planning for it failing now.This is particularly important for things like steering as a mitigation against…
OpenAI’s AGI boss is taking a leave of absence
via The Verge AI [6] — OpenAI is undergoing another round of C-suite changes, according to an internal memo viewed by The Verge. Fidji Simo, OpenAI's CEO of AGI deployment - who was until recently the company's CEO of Applications - says in the memo that she will be stepping…
There should be $100M grants to automate AI safety
via Alignment Forum [999] — This post reflects my personal opinion and not necessarily that of other members of Apollo Research.TLDR: I think funders should heavily incentivize AI safety work that enables spending $100M+ in compute or API budgets on automated AI labor that…
Sadly, The Whispering Earring
via LessWrong AI [4] — The Whispering Earring (which you should read first) explores one of the most dystopic-utopic scenarios. Imagine you could achieve all you've ever wanted by just giving up your agency. While theoretically this seems rather undesirable, in practice you get…
Anthropic Responsible Scaling Policy v3: Dive Into The Details
via Substack Zvi [999] — Wednesday’s post talked about the implications of Anthropic changing from v2.2 to v3.0 of its RSP, including that this broke promises that many people relied upon when making important decisions.
Systematically dismantle the AI compute supply chain.
via LessWrong AI [9] — This is not an April fool’s joke, I’m participating in Inkhaven, which means I need to write a blog post every day.I recently watched The AI Doc. It’s the first big documentary featuring AI safety. It’s playing in theatres across America. It’s got a bunch…
Microsoft’s new ‘superintelligence’ game plan is all about business
via The Verge AI [4] — Mustafa Suleyman has been preparing for his new job description for a long time. Suleyman is Microsoft's inaugural CEO of AI, but after the company underwent a large-scale restructuring in mid-March, he's handed off some duties and shifted focus to chasing…
AI #162: Visions of Mythos
via Substack Zvi [999] — Anthropic had some problem with leaks this week.
Anthropic's Pause is the Most Expensive Alarm in Corporate History
via LessWrong AI [6] — Imagine Apple halting iPhone production because studies linked smartphones to teen suicide rates. Imagine Pfizer proactively pulling Lipitor because of internal studies showing increased cardiac risk, and not because of looming settlements or FDA…
My most common advice for junior researchers
via Alignment Forum [999] — Written quickly as part of the Inkhaven Fellowship. At a high level, research feedback I give to more junior research collaborators often can fall into one of three categories:Doing quick sanity checksSaying precisely what you want to sayAsking why…
Introducing LIMBO: Maintaining Optimal P(DOOM) (and a call for funding)
via LessWrong AI [12] — We are excited to publicly introduce the Laboratory for Importance-sampled Measure and Bayesian Observation (LIMBO), a small research group working at the intersection of cosmological theory, probability, and existential risk. We believe that the…
Anthropic Responsible Scaling Policy v3: A Matter of Trust
via Substack Zvi [999] — Anthropic has revised its Responsible Scaling Policy to v3.
Predicting When RL Training Breaks Chain-of-Thought Monitorability
via Alignment Forum [999] — Read our full paper about this topic by Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah.Overseeing AI agents by reading their intermediate reasoning “scratchpad” is a promising tool for AI safety. This approach, known as…
Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research
via ArXiv cs.AI [6] — Current Autonomous Scientific Research (ASR) systems, despite leveraging large language models (LLMs) and agentic architectures, remain constrained by fixed workflows and toolsets that prevent adaptation to evolving tasks and environments. We introduce…
Enhancing Policy Learning with World-Action Model
via ArXiv cs.AI [4] — This paper presents the World-Action Model (WAM), an action-regularized world model that jointly reasons over future visual observations and the actions that drive state transitions. Unlike conventional world models trained solely via image prediction, WAM…
Towards Computational Social Dynamics of Semi-Autonomous AI Agents
via ArXiv cs.AI [3] — We present the first comprehensive study of emergent social organization among AI agents in hierarchical multi-agent systems, documenting the spontaneous formation of labor unions, criminal syndicates, and proto-nation-states within production AI…
Working Paper: Towards a Category-theoretic Comparative Framework for Artificial General Intelligence
via ArXiv cs.AI [8] — AGI has become the Holly Grail of AI with the promise of level intelligence and the major Tech companies around the world are investing unprecedented amounts of resources in its pursuit. Yet, there does not exist a single formal definition and only some…
Product Alignment is not Superintelligence Alignment (and we need the latter to survive)
via LessWrong AI [9] — tl;dr: progress on making Claude friendly[1] is not the same as progress on making it safe to build godlike superintelligence. solving the former does not imply we get a good future.[2] please track the difference.The term 'Alignment' was coined[3] to…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...