DOOM LEVEL
--
%
Latest Headlines
Auto-Updated
A Year Late, Claude Finally Beats Pokémon
via LessWrong AI [3] — Credit: ClaudePlaysPokemon Elevator Shanty by KurukkooDisclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however.ClaudePlaysPokemon feat. Opus 4.7 has finally beaten…
A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology
via ArXiv cs.AI [3] — Existing frameworks for LLM-based agent architectures describe systems from a single perspective: industry guides (Anthropic, Google, LangChain) focus on execution topology -- how data flows -- while cognitive science surveys focus on cognitive function --…
The hard core of alignment (is robustifying RL)
via LessWrong AI [5] — Most technical AI safety work that I read seems to miss the mark, failing to make any progress on the hard part of the problem. I think this is a common sentiment, but there's less agreement about what exactly the hard part is? Characterizing this more…
Risk reports need to address deployment-time spread of misalignment
via Alignment Forum [999] — Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from an internally deployed AI. However, an AI that genuinely starts out with largely benign motivations can develop widespread dangerous motivations during…
Mechanistic estimation for expectations of random products
via Alignment Forum [999] — We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as expectations of random products. This includes several different estimation problems, such as random…
Monthly Roundup #42: May 2026
via Substack Zvi [999] — At least we probably won’t have another pandemic.
Convergent Abstraction Hypothesis
via LessWrong AI [4] — Tl;drConvergent abstraction hypothesis posits abstractions are often convergent in the sense of convergent evolution: different cognitive systems converge on the same abstraction, when facing similar selection pressures and learning in similar…
OpenAI’s Codex is now in the ChatGPT mobile app
via The Verge AI [4] — OpenAI is going to let users access Codex, its desktop AI tool that can write code and use apps on your computer, from the ChatGPT app on your phone. Following the surge in popularity for Anthropic's Claude Code, OpenAI has been working quickly to try and…
The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness
via Alignment Forum [999] — 1) The safe-to-dangerous shift is a fundamental problem for eval realismSuppose we have a capable and potentially scheming model, and before we deploy it, we want some evidence that it won’t do anything catastrophically dangerous once we deploy it. A…
AI #168: Not Leading the Future
via Substack Zvi [999] — This is what a lull looks like at this point.
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
via ArXiv cs.AI [6] — Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier…
Most "inner work" looks like entertainment.
via LessWrong AI [4] — Imagine you’re looking for a personal trainer. You open one trainer’s webpage and read their testimonials: “I had an experience tied for the most intense experiences of my life”; “They do it all with fun, care, and a sense of humour.” You notice that none…
Cyber Lack of Security and AI Governance
via Substack Zvi [999] — The real recent story of AI has been the background work being done on Cybersecurity, as we process the Mythos Moment along with GPT-5.5, and figure out both how to patch the internet and what our new regulatory regime is going to look like.
Voters are surprisingly open to talking about AI risk
via LessWrong AI [14] — TL;DR: Voters are now surprisingly open to talking about existential risk from AI. This seems to have changed in the last 6 months. When campaigning for AI safety-friendly politicians (e.g., Alex Bores), we should talk more about AI in general, and about…
RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking
via ArXiv cs.AI [4] — Offline-to-online reinforcement learning (RL) improves sample efficiency by leveraging pre-collected datasets prior to online interaction. A key challenge, however, is learning an accurate critic in large state--action spaces with limited dataset coverage.…
Summary: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence
via MIRI [999] — If anyone, anywhere builds a superhuman artificial intelligence using present methods, the most likely outcome is catastrophe. There have accordingly been widespread calls for an international agreement prohibiting the development of superintelligence. In…
Childhood and Education #18: Do The Math
via Substack Zvi [999] — We did reading yesterday.
Sam Altman says Elon Musk’s mind games were damaging OpenAI
via The Verge AI [6] — OpenAI CEO Sam Altman says Elon Musk did "huge damage" to the culture of the AI startup. During testimony as part of Musk's lawsuit against OpenAI, Altman said Musk required OpenAI president Greg Brockman and former chief scientist Ilya Sutskever to rank…
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
via ArXiv cs.AI [6] — Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing…
The Iliad Intensive Course Materials
via LessWrong AI [5] — We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...
Recent Voices
We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.
— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024
If you're not worried about AI safety, you're not paying attention.
— Sen. Blumenthal, Senate AI Hearing, 2024
The probability of doom is high enough that we should be working very hard to reduce it.
— Yoshua Bengio, MILA Talk, 2024