pDoom (Page 18)

DOOM LEVEL -- %

Latest Headlines Auto-Updated

3 months ago Research

Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models

via ArXiv cs.AI [5] — Safety trained large language models (LLMs) can often be induced to answer harmful requests through jailbreak prompts. Because we lack a robust understanding of why LLMs are susceptible to jailbreaks, future frontier models operating more autonomously in…

3 months ago Analysis

OpenAI's red line for AI self-improvement is fundamentally flawed

via LessWrong AI [4] — TL;DR. OpenAI's "Critical" threshold for AI self-improvement in the Preparedness Framework v2 has three structural problems:It fires too late. The lagging indicator, 5× generational acceleration sustained for several months, lets ~3 years of effective…

3 months ago Research Essential

Exploration Hacking: Can LLMs Learn to Resist RL Training?

via Alignment Forum [999] — We empirically investigate exploration hacking (EH) — where models strategically alter their exploration to resist RL training — by creating model organisms that resist capability elicitation, evaluating countermeasures, and auditing frontier models…

3 months ago Research Essential

Risk from fitness-seeking AIs: mechanisms and mitigations

via Alignment Forum [999] — Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This misalignment is still somewhat incoherent, but it increasingly resembles what I call…

3 months ago Analysis Essential

Housing Roundup #14: You Can't Build That

via Substack Zvi [999] — Why can’t you build it?

3 months ago Analysis Essential

AI unemployment and AI extinction are often the same

via LessWrong AI [10] — My sense is that people think of AI existential risk and AI unemployment as distinct issues. Some people are extremely concerned about extinction and perhaps even indifferent to total unemployment. Some people think of moderate AI unemployment as a…

3 months ago Industry

Pentagon strikes classified AI deals with OpenAI, Google, and Nvidia — but not Anthropic

via The Verge AI [4] — The Pentagon has struck deals with OpenAI, Google, Microsoft, Amazon, Nvidia, Elon Musk's xAI, and the startup Reflection, allowing the agency to use their AI tools in classified settings, according to an announcement on Friday. At the same time, the…

3 months ago Industry

Microsoft wants lawyers to trust its new AI agent in Word documents

via The Verge AI [4] — Microsoft is launching a new AI agent inside Word that's specifically designed for legal teams. Legal Agent handles document edits, negotiation history, and complex documents to help legal teams handle tasks like reviewing contracts. "Instead of relying on…

3 months ago Analysis Essential

AI risk was not invented by AI CEOs to hype their companies

via LessWrong AI [9] — I hear that many people believe that the idea of advanced AI threatening human existence was invented by AI CEOs to hype their products. I’ve even been condescendingly informed of this, as if I am the one at risk of naively accepting AI companies’…

3 months ago Research

Binary Spiking Neural Networks as Causal Models

via ArXiv cs.AI [4] — We provide a causal analysis of Binary Spiking Neural Networks (BSNNs) to explain their behavior. We formally define a BSNN and represent its spiking activity as a binary causal model. Thanks to this causal representation, we are able to explain the output…

3 months ago Industry Essential

This startup’s new mechanistic interpretability tool lets you debug LLMs

via MIT Technology Review [8] — The San Francisco–based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior—during training. This could give…

3 months ago Analysis Essential

AI #166: Google Sells Out

via Substack Zvi [999] — This was the week of GPT-5.5.

3 months ago Research

Distill-Belief: Closed-Loop Inverse Source Localization and Characterization in Physical Fields

via ArXiv cs.AI [3] — {Closed-loop inverse source localization and characterization (ISLC) requires a mobile agent to select measurements that localize sources and infer latent field parameters under strict time constraints.} {The core challenge lies in the belief-space…

3 months ago Analysis

No Strong Orthogonality From Selection Pressure

via LessWrong AI [4] — A postratfic version of this essay, together with the acknowledgements for both, is available on SubstackEdit: if no one thinks an agent can become superintelligent and contest the lightcone while maintaining arbitrarily stupid goals, thats great! I’m only…

3 months ago Research Essential

Research Sabotage in ML Codebases

via Alignment Forum [999] — One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:Perform sloppy research in order to slow down the…

3 months ago Industry

Building the compute infrastructure for the Intelligence Age

via OpenAI Blog [6] — OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.

3 months ago Analysis Essential

The Most Important Charts In The World

via Substack Zvi [999] — We all need a break so: What is the most important chart in the world?

3 months ago Industry

Larry’s risky business

via The Verge AI [4] — If you want to know whether the AI bubble is bursting, there's only one publicly traded company that will tell you: Oracle. That's right, the database company. Oracle has burned its boats and pivoted to AI, but not in any kind of usual way. It is not a…

3 months ago Research

Sparse Personalized Text Generation with Multi-Trajectory Reasoning

via ArXiv cs.AI [6] — As Large Language Models (LLMs) advance, personalization has become a key mechanism for tailoring outputs to individual user needs. However, most existing methods rely heavily on dense interaction histories, making them ineffective in cold-start scenarios…

3 months ago Research Essential

Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers

via Alignment Forum [999] — We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways that are verifiable shortly after answering (e.g., a myopic fitness seeker), it may be difficult to get…

Live Doom Meter

-- %

0% — We're fine 100% — GG

P(Doom) Scoreboard

0%25%50%75%100%

Loading estimates...

Recent Voices

We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.

— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024

If you're not worried about AI safety, you're not paying attention.

— Sen. Blumenthal, Senate AI Hearing, 2024

The probability of doom is high enough that we should be working very hard to reduce it.

— Yoshua Bengio, MILA Talk, 2024