pDoom (Page 17)

DOOM LEVEL -- %

Latest Headlines Auto-Updated

2 months ago Research Essential

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

via Alignment Forum [999] — AbstractWe introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text…

2 months ago Research Essential

Mechanistic estimation for wide random MLPs

via Alignment Forum [999] — This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riedel for comments on the post. In ARC's latest paper, we study the following problem: given a randomly…

2 months ago Analysis Essential

AI #167: The Prior Restraint Era Begins

via Substack Zvi [999] — The era of training frontier models and then releasing them whenever you wanted?

3 months ago Analysis

Many individual CEVs are probably quite bad

via LessWrong AI [4] — I was thinking about Habryka's article on Putin's CEV, but I am posting my response here, because the original article is already 3 weeks old.I am not sure how exactly a person's CEV is defined. "If we knew everything and could self-modify" seems…

3 months ago Analysis

x-risk-themed

via LessWrong AI [5] — Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could…

3 months ago Analysis

What if LLMs are mostly crystallized intelligence?

via LessWrong AI [5] — SummaryLLMs are better at developing crystallized intelligence than fluid intelligence. That is: LLM training is good at building crystallized intelligence by learning patterns from training data, and this is sufficient to make them surprisingly skillful…

3 months ago Analysis Essential

What is Anthropic?

via Substack Zvi [999] — What is Anthropic?

3 months ago Research

Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?

via ArXiv cs.AI [4] — Modern coding agents increasingly delegate specialized subtasks to subagents, which are smaller, focused agentic loops that handle narrow responsibilities like search, debugging or terminal execution. This architectural pattern keeps the main agent's…

3 months ago Analysis Essential

The AI Ad-Hoc Prior Restraint Era Begins

via Substack Zvi [999] — The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontier AI policy into a full prior restraint regime, where anyone wishing to release a highly capable new…

3 months ago Research Essential

[Linkpost] Interpreting Language Model Parameters

via Alignment Forum [999] — This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on…

3 months ago Research Essential

Motivated reasoning, confirmation bias, and AI risk theory

via Alignment Forum [999] — Of the fifty-odd biases discovered by Kahneman, Tversky, and their successors, forty-nine are cute quirks, and one is destroying civilization. This last one is confirmation bias.- From Scott Alexander's review of Julia Galef's The Scout Mindset.…

3 months ago Industry

Google’s AI architect lived rent-free in Elon Musk’s head

via The Verge AI [4] — About a week into the Musk v. Altman trial, we've heard from some of the most powerful people in tech - including OpenAI president Greg Brockman, Elon Musk's fixer Jared Birchall, and Musk himself. But one of the most prominent characters is hovering…

3 months ago Research

Understanding Emergent Misalignment via Feature Superposition Geometry

via ArXiv cs.AI [6] — Emergent misalignment, where fine-tuning on narrow, non-harmful tasks induces harmful behaviors, poses a key challenge for AI safety in LLMs. Despite growing empirical evidence, its underlying mechanism remains unclear. To uncover the reason behind this…

3 months ago Industry

OpenAI and PwC collaborate to reimagine the office of the CFO

via OpenAI Blog [6] — OpenAI and PwC are partnering to help enterprises use AI agents to automate finance workflows, improve forecasting, strengthen controls, and modernize the CFO function.

3 months ago Analysis Essential

Housing Roundup #15: The War Against Renters

via Substack Zvi [999] — So many are under the strange belief that there is something terrible about not owning the house in which you live.

3 months ago Industry

The creator of Roomba is back with a furry robot companion

via The Verge AI [4] — Colin Angle, the maker of the Roomba and the man who helped put 50 million household robots into people's homes, is back with a new robot. But this one is designed as a companion, not a cleaner. The first robot from Angle's new company, Familiar Machines &…

3 months ago Analysis

AI Industrial Takeoff — Part 1: Maximum growth rates with current technology

via LessWrong AI [4] — How fast could an AI-driven economy grow? Most economists expect a few percentage points at best, comparable to previous general-purpose technologies (Acemoglu (2024)). Those closer to AI development tend to imagine something much more radical (Shulman…

3 months ago Industry

Tailoring AI solutions for health care needs

via MIT Technology Review [4] — The AI market is full of big promises of grand transformation. Health care is a prime target for those promises, beset as it is by financial pressures, labor shortages, and the growing burden of caring for an aging population. AI developers are…

3 months ago Research Essential

TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization

via ArXiv cs.AI [9] — Aligning large language models (LLMs) with human preferences is commonly done via reinforcement learning from human feedback (RLHF) with Proximal Policy Optimization (PPO) or, more simply, via Direct Preference Optimization (DPO). While DPO is stable and…

3 months ago Research

Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models

via ArXiv cs.AI [5] — Safety trained large language models (LLMs) can often be induced to answer harmful requests through jailbreak prompts. Because we lack a robust understanding of why LLMs are susceptible to jailbreaks, future frontier models operating more autonomously in…

Live Doom Meter

-- %

0% — We're fine 100% — GG

P(Doom) Scoreboard

0%25%50%75%100%

Loading estimates...

Recent Voices

We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.

— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024

If you're not worried about AI safety, you're not paying attention.

— Sen. Blumenthal, Senate AI Hearing, 2024

The probability of doom is high enough that we should be working very hard to reduce it.

— Yoshua Bengio, MILA Talk, 2024