Zac Boring - pDoom (Page 4)

Zac Boring 13 days ago Analysis

Claude Fable 5 and Mythos 5: The System Card

via Substack Zvi [999] — First things first: Claude Fable 5 is the new best publicly available model.

Zac Boring 13 days ago Research

Building and evaluating model diffing agents

via Alignment Forum [999] — This is the second in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The first post can be found here.TL;DRIt is possible to build extremely simple agents that…

Zac Boring 13 days ago Research

Sympathy for both sides of the egregious misalignment debate

via Alignment Forum [999] — On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of…

Zac Boring 13 days ago Industry

The Download: “reprogramming” aging, and the hidden sense of interoception

via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Why “reprogramming” is the buzziest approach to reversing aging right now Earlier this week, Life…

Zac Boring 13 days ago Industry

Why “reprogramming” is the buzziest approach to reversing aging right now

via MIT Technology Review [4] — Earlier this week, Life Biosciences, a biotech company focused on reversing age-related diseases, announced that it had dosed its first volunteer. A person with glaucoma has had an experimental treatment injected straight into their eyeball. The…

Zac Boring 13 days ago Analysis

PSA: Almost nobody is working on alignment

via LessWrong AI [9] — People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human…

Zac Boring 13 days ago Research

From AGI to ASI

via ArXiv cs.AI [8] — Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculation to being a concrete next-decade target for many of the largest AI organisations. Achieving this goal would have profound and far-reaching…

Zac Boring 14 days ago Analysis

AI #172: The First Fable

via Substack Zvi [999] — A lot happened this week, including a great trip out to Lighthaven.

Zac Boring 14 days ago Industry

The Download: soccer’s data renaissance and China’s big nuclear plans

via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Inside soccer’s data renaissance Imagine tuning in to the opening kickoff of a World Cup match and seeing a…

Zac Boring 14 days ago Industry

Google DeepMind is worried about what happens when millions of agents start to interact

via MIT Technology Review [10] — Google DeepMind is funding research into the potential dangers of millions of different AI agents interacting with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of…

Zac Boring 14 days ago Industry

Inside soccer’s data renaissance

via MIT Technology Review [4] — Imagine tuning in to the opening kickoff of a World Cup match and seeing a player intentionally send the ball all the way down the pitch and right out of bounds on the opponent’s end. Casual fans might scratch their heads. Where’s the logic in…

Zac Boring 14 days ago Research

Models May Behave Worse When Eval Aware

via Alignment Forum [999] — This is the first in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas.TL;DRIt's often assumed that models will act more aligned when they can tell they're being…

Zac Boring 14 days ago Research

Position: Hippocampal Explicit Memory Is the Cornerstone for AGI

via ArXiv cs.AI [10] — Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing LLMs…

Zac Boring 14 days ago Analysis

You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them

via LessWrong AI [4] — Detecting Hidden Behaviors in LLMs via Activation-matched Finetuning — preprint, 2026. [Paper] [Code]TLDR. Given a model with some unknown, abnormal behavior (backdoors, censorship, reward hacking, ...), construct an aligned reference by training a clean…

Zac Boring 15 days ago Analysis

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

via LessWrong AI [3] — (see full author list at the end)About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain…

Zac Boring 15 days ago Industry

The future of AI regulation is courting the strangest, most anxious bedfellows

via The Verge AI [3] — Hello and welcome to Regulator, a newsletter for Verge subscribers about tech politics, tech influence, and tech shenanigans in Washington, DC. (If you're not a subscriber, you can get on board here.) We're back after a two-week hiatus, during most of…

Zac Boring 15 days ago Research

Sequent: scale and automation for higher confidence in alignment

via Alignment Forum [999] — Alignment is not on trackArtificial superintelligence (ASI) may be developed in the next few years. It is unclear whether alignment is on track to be ready on the same timeframe. At a minimum, the empirical programs at AI labs are unlikely to deliver…

Zac Boring 15 days ago Research

Investing in multi-agent AI safety research

via DeepMind Blog [7] — Google DeepMind and partners announce a $10M funding call for multi-agent safety research.

Zac Boring 15 days ago Research

Tracing Eval-Awareness Emergence Through Training of OLMo 3

via Alignment Forum [999] — TL;DRRecent work from Goodfire & UK AISI – Verbalized Eval Awareness Inflates Measured Safety – shows that newer open-weight models verbalize evaluation-awareness (VEA) more often, and that this inflates measured safety. Between OLMo-3-32B-Think and…

Zac Boring 16 days ago Analysis

Three Labs With a Plan and A Memorandum

via Substack Zvi [999] — The big story today is the release of Claude Fable 5, the version of Claude Mythos that Anthropic believes they can safely distribute to the people.