pDoom

DOOM LEVEL -- %

Latest Headlines Auto-Updated

17 hours ago Research Essential

Operationalizing FDT

via Alignment Forum [999] — This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:given a logical causal graph, how do we define the logical do-operator?what is logical causality and how might it be formalized?how…

a day ago Analysis

Ideologies Embed Taboos Against Common Knowledge Formation: a Case Study with LLMs

via LessWrong AI [4] — LLMs are searchable holograms of the text corpus they were trained on. RLHF LLM chat agents have the search tuned to be person-like. While one shouldn't excessively anthropomorphize them, they're helpful for simple experimentation into the latent…

a day ago Analysis Essential

Why AI Evaluation Regimes are bad

via LessWrong AI [9] — How the flagship project of the AI Safety Community ended up helping AI Corporations.I care about preventing extinction risks from superintelligence. This de facto makes me part of the “AI Safety” community, a social cluster of people who care about these…

a day ago Analysis Essential

AI #159: See You In Court

via Substack Zvi [999] — The conflict between Anthropic and the Department of War has now moved to the courts, where Anthropic has challenged the official supply chain risk designation as well as the order to remove it from systems across the government, claiming retaliation for…

2 days ago Analysis

Dwarkesh Patel on the Anthropic DoW dispute

via LessWrong AI [3] — Below is the text of blog post that Dwarkesh Patel wrote on the Anthropic DoW dispute and related topics. He has also narrated it here. By now, I’m sure you’ve heard that the Department of War has declared Anthropic a supply chain risk, because Anthropic…

2 days ago Research Essential

How well do models follow their constitutions?

via Alignment Forum [999] — This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan.There's been a lot of buzz around Claude's 30K word constitution ("soul doc"), and unusual ways Anthropic is integrating it into training.If we can…

2 days ago Analysis

How Hard a Problem is Alignment? (My Opinionated Answer)

via LessWrong AI [6] — TL;DR: Comparing person-years of effort, I argue that AI Safety seems harder than for steam engines, but probably less hard than the Apollo program or . I discuss why I suspect superalignment might not be super-hard. My has come down over the last…

2 days ago Industry

Grammarly says it will stop using AI to clone experts without permission

via The Verge AI [4] — Superhuman says it has disabled Grammarly's "expert review" AI feature that said its edit suggestions were "inspired by" real writers, including our editor-in-chief and other Verge staff members. "After careful consideration, we have decided to disable…

2 days ago Industry

Canva’s new editing tool adds layers to AI-generated designs

via The Verge AI [4] — Canva introduced a new feature that separates flat image files and AI-generated visuals into layered, fully editable designs. The Magic Layers tool is launching in public beta today in the US, UK, Canada, and Australia, allowing design components like…

2 days ago Analysis Essential

GPT-5.4 Is A Substantial Upgrade

via Substack Zvi [999] — Benchmarks have never been less useful for telling us which models are best.

2 days ago Research Essential

The Refined Counterfactual Prisoner's Dilemma

via Alignment Forum [999] — I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom.Kendiukhov quotes Scott Garrabrant:My take is that the concept of expected utility maximization is a mistake. [...] As far as I…

2 days ago Analysis

The Day After Move 37

via LessWrong AI [4] — I was a few months into 21 years old when a hijacked plane crashed into the first World Trade Center tower. I was commuting in to work listening to the radio (as was the style at the times). I couldn’t figure out how the heck a plane could hit the tower.…

2 days ago Research Essential

AIs will be used in “unhinged” configurations

via Alignment Forum [999] — Writing up a probably-obvious point that I want to refer to later, with significant writing LLM writing help.TL;DR: 1) A common critique of AI safety evaluations is that they occur in unrealistic settings, such as excessive goal conflict, or are…

3 days ago Research

Meissa: Multi-modal Medical Agentic Intelligence

via ArXiv cs.AI [5] — Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However,…

3 days ago Analysis

What do we know about AI company employee giving?

via LessWrong AI [7] — Many Anthropic employees, especially, are sympathetic to AI safety and (will) have lots of money. This is something that is being talked about a lot (semi-)privately, but I haven't seen any public discussion of it. I find that striking. It seems like the…

3 days ago Analysis

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

via LessWrong AI [3] — TL;DR We release AuditBench, an alignment auditing benchmark. AuditBench consists of 56 language models with implanted hidden behaviors—such as sycophantic deference, opposition to AI regulation, or hidden loyalties—which they do not confess to when asked.…

3 days ago Analysis Essential

Interview with Steven Byrnes on His Mainline Takeoff Scenario

via LessWrong AI [9] — After using the latest version of Claude Code and being surprised how capable it's become while still behaving friendly and corrigibly, I wanted to reflect on how this new observation should update my world model and my P(Doom).So I reached out to Dr.…

3 days ago Research Essential

The case for satiating cheaply-satisfied AI preferences

via Alignment Forum [999] — A central AI safety concern is that AIs will develop unintended preferences and undermine human control to achieve them. But some unintended preferences are cheap to satisfy, and failing to satisfy them needlessly turns a cooperative situation into an…

3 days ago Industry

Meta acquires Moltbook, the Reddit-like network for AI agents

via The Verge AI [4] — Meta is acquiring Moltbook, a Reddit-like platform where AI agents can make and comment on posts, as first reported by Axios. In a statement to The Verge, Meta spokesperson Matthew Tye confirmed the Moltbook team will join Meta Superintelligence Labs as…

3 days ago Analysis

The case for AI safety capacity-building work

via LessWrong AI [7] — TL;DR:I think many of the marginal hires at larger organizations doing AI safety technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND TASP, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding (or being…

Live Doom Meter

-- %

0% — We're fine 100% — GG

P(Doom) Scoreboard

0%25%50%75%100%

Loading estimates...

Recent Voices

We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.

— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024

If you're not worried about AI safety, you're not paying attention.

— Sen. Blumenthal, Senate AI Hearing, 2024

The probability of doom is high enough that we should be working very hard to reduce it.

— Yoshua Bengio, MILA Talk, 2024