Zac Boring - pDoom (Page 22)

Zac Boring 3 months ago Analysis

Dwarkesh Patel on the Anthropic DoW dispute

via LessWrong AI [3] — Below is the text of blog post that Dwarkesh Patel wrote on the Anthropic DoW dispute and related topics. He has also narrated it here. By now, I’m sure you’ve heard that the Department of War has declared Anthropic a supply chain risk, because Anthropic…

Zac Boring 3 months ago Research

How well do models follow their constitutions?

via Alignment Forum [999] — This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan.There's been a lot of buzz around Claude's 30K word constitution ("soul doc"), and unusual ways Anthropic is integrating it into training.If we can…

Zac Boring 3 months ago Analysis

How Hard a Problem is Alignment? (My Opinionated Answer)

via LessWrong AI [6] — TL;DR: Comparing person-years of effort, I argue that AI Safety seems harder than for steam engines, but probably less hard than the Apollo program or . I discuss why I suspect superalignment might not be super-hard. My has come down over the last…

Zac Boring 3 months ago Industry

Grammarly says it will stop using AI to clone experts without permission

via The Verge AI [4] — Superhuman says it has disabled Grammarly's "expert review" AI feature that said its edit suggestions were "inspired by" real writers, including our editor-in-chief and other Verge staff members. "After careful consideration, we have decided to disable…

Zac Boring 3 months ago Industry

Canva’s new editing tool adds layers to AI-generated designs

via The Verge AI [4] — Canva introduced a new feature that separates flat image files and AI-generated visuals into layered, fully editable designs. The Magic Layers tool is launching in public beta today in the US, UK, Canada, and Australia, allowing design components like…

Zac Boring 3 months ago Analysis

GPT-5.4 Is A Substantial Upgrade

via Substack Zvi [999] — Benchmarks have never been less useful for telling us which models are best.

Zac Boring 3 months ago Research

The Refined Counterfactual Prisoner's Dilemma

via Alignment Forum [999] — I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom.Kendiukhov quotes Scott Garrabrant:My take is that the concept of expected utility maximization is a mistake. [...] As far as I…

Zac Boring 3 months ago Analysis

The Day After Move 37

via LessWrong AI [4] — I was a few months into 21 years old when a hijacked plane crashed into the first World Trade Center tower. I was commuting in to work listening to the radio (as was the style at the times). I couldn’t figure out how the heck a plane could hit the tower.…

Zac Boring 3 months ago Research

AIs will be used in “unhinged” configurations

via Alignment Forum [999] — Writing up a probably-obvious point that I want to refer to later, with significant writing LLM writing help.TL;DR: 1) A common critique of AI safety evaluations is that they occur in unrealistic settings, such as excessive goal conflict, or are…

Zac Boring 4 months ago Research

Meissa: Multi-modal Medical Agentic Intelligence

via ArXiv cs.AI [5] — Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However,…

Zac Boring 4 months ago Analysis

What do we know about AI company employee giving?

via LessWrong AI [7] — Many Anthropic employees, especially, are sympathetic to AI safety and (will) have lots of money. This is something that is being talked about a lot (semi-)privately, but I haven't seen any public discussion of it. I find that striking. It seems like the…

Zac Boring 4 months ago Analysis

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

via LessWrong AI [3] — TL;DR We release AuditBench, an alignment auditing benchmark. AuditBench consists of 56 language models with implanted hidden behaviors—such as sycophantic deference, opposition to AI regulation, or hidden loyalties—which they do not confess to when asked.…

Zac Boring 4 months ago Analysis

Interview with Steven Byrnes on His Mainline Takeoff Scenario

via LessWrong AI [9] — After using the latest version of Claude Code and being surprised how capable it's become while still behaving friendly and corrigibly, I wanted to reflect on how this new observation should update my world model and my P(Doom).So I reached out to Dr.…

Zac Boring 4 months ago Research

The case for satiating cheaply-satisfied AI preferences

via Alignment Forum [999] — A central AI safety concern is that AIs will develop unintended preferences and undermine human control to achieve them. But some unintended preferences are cheap to satisfy, and failing to satisfy them needlessly turns a cooperative situation into an…

Zac Boring 4 months ago Industry

Meta acquires Moltbook, the Reddit-like network for AI agents

via The Verge AI [4] — Meta is acquiring Moltbook, a Reddit-like platform where AI agents can make and comment on posts, as first reported by Axios. In a statement to The Verge, Meta spokesperson Matthew Tye confirmed the Moltbook team will join Meta Superintelligence Labs as…

Zac Boring 4 months ago Analysis

The case for AI safety capacity-building work

via LessWrong AI [7] — TL;DR:I think many of the marginal hires at larger organizations doing AI safety technical or policy work right now (including e.g. Apollo, Redwood, METR, RAND TASP, GovAI, Epoch, UKAISI, and Anthropic’s safety teams) would be capable of founding (or being…

Zac Boring 4 months ago Research

Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment

via ArXiv cs.AI [5] — Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among them with an imperfect reward model. However, current strategies face a fundamental dilemma: ``optimistic''…

Zac Boring 4 months ago Research

Autonomous AI Agents for Option Hedging: Enhancing Financial Stability through Shortfall Aware Reinforcement Learning

via ArXiv cs.AI [3] — The deployment of autonomous AI agents in derivatives markets has widened a practical gap between static model calibration and realized hedging outcomes. We introduce two reinforcement learning frameworks, a novel Replication Learning of Option Pricing…

Zac Boring 4 months ago Industry

Employees across OpenAI and Google support Anthropic’s lawsuit against the Pentagon

via The Verge AI [4] — On Monday, Anthropic filed its lawsuit against the Department of Defense over being designated as a supply chain risk. Hours later, nearly 40 employees from OpenAI and Google - including Jeff Dean, Google's chief scientist and Gemini lead - filed an amicus…

Zac Boring 4 months ago Analysis

Claude Code, Claude Cowork and Codex #5

via Substack Zvi [999] — It feels good to get back to some of the fun stuff.