Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
DOOM LEVEL -- %
Latest Headlines Auto-Updated
6 days ago Analysis
A Year Late, Claude Finally Beats Pokémon
via LessWrong AI [3] — Credit: ClaudePlaysPokemon Elevator Shanty by KurukkooDisclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however.ClaudePlaysPokemon feat. Opus 4.7 has finally beaten…
7 days ago Research
A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology
via ArXiv cs.AI [3] — Existing frameworks for LLM-based agent architectures describe systems from a single perspective: industry guides (Anthropic, Google, LangChain) focus on execution topology -- how data flows -- while cognitive science surveys focus on cognitive function --…
7 days ago Analysis
The hard core of alignment (is robustifying RL)
via LessWrong AI [5] — Most technical AI safety work that I read seems to miss the mark, failing to make any progress on the hard part of the problem. I think this is a common sentiment, but there's less agreement about what exactly the hard part is? Characterizing this more…
7 days ago Research Essential
Risk reports need to address deployment-time spread of misalignment
via Alignment Forum [999] — Risk reports commonly use pre-deployment alignment assessments to measure misalignment risk from an internally deployed AI. However, an AI that genuinely starts out with largely benign motivations can develop widespread dangerous motivations during…
7 days ago Research Essential
Mechanistic estimation for expectations of random products
via Alignment Forum [999] — We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as expectations of random products. This includes several different estimation problems, such as random…
7 days ago Analysis Essential
Monthly Roundup #42: May 2026
via Substack Zvi [999] — At least we probably won’t have another pandemic.
7 days ago Analysis
Convergent Abstraction Hypothesis
via LessWrong AI [4] — Tl;drConvergent abstraction hypothesis posits abstractions are often convergent in the sense of convergent evolution: different cognitive systems converge on the same abstraction, when facing similar selection pressures and learning in similar…
8 days ago Industry
OpenAI’s Codex is now in the ChatGPT mobile app
via The Verge AI [4] — OpenAI is going to let users access Codex, its desktop AI tool that can write code and use apps on your computer, from the ChatGPT app on your phone. Following the surge in popularity for Anthropic's Claude Code, OpenAI has been working quickly to try and…
8 days ago Research Essential
The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness
via Alignment Forum [999] — 1) The safe-to-dangerous shift is a fundamental problem for eval realismSuppose we have a capable and potentially scheming model, and before we deploy it, we want some evidence that it won’t do anything catastrophically dangerous once we deploy it. A…
8 days ago Analysis Essential
AI #168: Not Leading the Future
via Substack Zvi [999] — This is what a lull looks like at this point.
9 days ago Research
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
via ArXiv cs.AI [6] — Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier…
9 days ago Analysis
Most "inner work" looks like entertainment.
via LessWrong AI [4] — Imagine you’re looking for a personal trainer. You open one trainer’s webpage and read their testimonials: “I had an experience tied for the most intense experiences of my life”; “They do it all with fun, care, and a sense of humour.” You notice that none…
9 days ago Analysis Essential
Cyber Lack of Security and AI Governance
via Substack Zvi [999] — The real recent story of AI has been the background work being done on Cybersecurity, as we process the Mythos Moment along with GPT-5.5, and figure out both how to patch the internet and what our new regulatory regime is going to look like.
9 days ago Analysis Essential
Voters are surprisingly open to talking about AI risk
via LessWrong AI [14] — TL;DR: Voters are now surprisingly open to talking about existential risk from AI. This seems to have changed in the last 6 months. When campaigning for AI safety-friendly politicians (e.g., Alex Bores), we should talk more about AI in general, and about…
10 days ago Research
RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking
via ArXiv cs.AI [4] — Offline-to-online reinforcement learning (RL) improves sample efficiency by leveraging pre-collected datasets prior to online interaction. A key challenge, however, is learning an accurate critic in large state--action spaces with limited dataset coverage.…
10 days ago Research Essential
Summary: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence
via MIRI [999] — If anyone, anywhere builds a superhuman artificial intelligence using present methods, the most likely outcome is catastrophe. There have accordingly been widespread calls for an international agreement prohibiting the development of superintelligence. In…
10 days ago Analysis Essential
Childhood and Education #18: Do The Math
via Substack Zvi [999] — We did reading yesterday.
10 days ago Industry
Sam Altman says Elon Musk’s mind games were damaging OpenAI
via The Verge AI [6] — OpenAI CEO Sam Altman says Elon Musk did "huge damage" to the culture of the AI startup. During testimony as part of Musk's lawsuit against OpenAI, Altman said Musk required OpenAI president Greg Brockman and former chief scientist Ilya Sutskever to rank…
11 days ago Research
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
via ArXiv cs.AI [6] — Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing…
11 days ago Analysis
The Iliad Intensive Course Materials
via LessWrong AI [5] — We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...
Recent Voices
We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.
— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024
If you're not worried about AI safety, you're not paying attention.
— Sen. Blumenthal, Senate AI Hearing, 2024
The probability of doom is high enough that we should be working very hard to reduce it.
— Yoshua Bengio, MILA Talk, 2024