Zac Boring - pDoom (Page 14)

Zac Boring 2 months ago Research

$50 million a year for a 10% chance to ban ASI

via Alignment Forum [999] — ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the…

Zac Boring 2 months ago Research

Governing the Agentic Enterprise: A Governance Maturity Model for Managing AI Agent Sprawl in Business Operations

via ArXiv cs.AI [4] — The rapid adoption of agentic AI in enterprise business operations--autonomous systems capable of planning, reasoning, and executing multi-step workflows--has created an urgent governance crisis. Organizations face uncontrolled agent sprawl: the…

Zac Boring 2 months ago Analysis

Opus 4.7 Part 1: The Model Card

via Substack Zvi [999] — Less than a week after completing coverage of Claude Mythos, here we are again as Anthropic gives us Claude Opus 4.7.

Zac Boring 2 months ago Analysis

Resources for starting and growing an AI safety org

via LessWrong AI [5] — It seems that AI safety is at least partly bottlenecked by a lack of orgs. To help address that, we’ve added a page to AISafety.com aimed at lowering the friction for starting one: AISafety.com/founders.This page was built largely as the result of a…

Zac Boring 2 months ago Research

LLM Reasoning Is Latent, Not the Chain of Thought

via ArXiv cs.AI [5] — This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation rather than as faithful surface chain-of-thought (CoT). This matters because claims about faithfulness, interpretability, reasoning…

Zac Boring 2 months ago Analysis

Reevaluating "AGI Ruin: A List of Lethalities" in 2026

via LessWrong AI [7] — It's been about four years since Eliezer Yudkowsky published AGI Ruin: A List of Lethalities, a 43-point list of reasons the default outcome from building AGI is everyone dying. A week later, Paul Christiano replied with Where I Agree and Disagree with…

Zac Boring 2 months ago Analysis

Consent-Based RL: Letting Models Endorse Their Own Training Updates

via LessWrong AI [5] — AKA scalable oversight of value driftTL;DR LLMs could be aligned but then corrupted through RL, instrumentally converging on deep consequentialism. If LLMs are sufficiently aligned and can properly oversee their training updates, we they can prevent…

Zac Boring 2 months ago Research

Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability

via Alignment Forum [999] — Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT…

Zac Boring 2 months ago Analysis

AI #164: Pre Opus

via Substack Zvi [999] — This is a day late because, given the discourse around Dwarkesh Patel’s interview with Jensen Huang, I pushed the weekly to Friday.

Zac Boring 2 months ago Analysis

On Dwarkesh Patel's Podcast With Nvidia CEO Jensen Huang

via Substack Zvi [999] — Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level.

Zac Boring 2 months ago Industry

OpenAI’s big Codex update is a direct shot at Anthropic’s Claude Code

via The Verge AI [4] — OpenAI is beefing up its agentic coding and development system Codex with a suite of updates that let it use your computer, generate images, and remember from past experiences. Codex will now be able to operate desktop apps on your computer, OpenAI says in…

Zac Boring 2 months ago Research

You can only build safe ASI if ASI is globally banned

via Alignment Forum [999] — Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the presumably "unsafe" kind.[1]There are various flavors of “safe” people suggest.Sometimes they suggest building “aligned”…

Zac Boring 2 months ago Research

Optimizing Earth Observation Satellite Schedules under Unknown Operational Constraints: An Active Constraint Acquisition Approach

via ArXiv cs.AI [4] — Earth Observation (EO) satellite scheduling (deciding which imaging tasks to perform and when) is a well-studied combinatorial optimization problem. Existing methods typically assume that the operational constraint model is fully specified in advance. In…

Zac Boring 2 months ago Analysis

What is the Iliad Intensive?

via LessWrong AI [9] — Almost two months ago, Iliad announced the Iliad Intensive and Iliad Fellowship. Fellowships are a well-understood unit, but what is an intensive? This post explains this in more detail!Comparison. The Iliad Intensive has similarities to ARENA, but focuses…

Zac Boring 2 months ago Research

Current AIs seem pretty misaligned to me

via Alignment Forum [999] — Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of…

Zac Boring 2 months ago Analysis

Claude Code, Codex and Agentic Coding #7: Auto Mode

via Substack Zvi [999] — As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrades.

Zac Boring 2 months ago Analysis

Diary of a "Doomer": 12+ years arguing about AI risk (part 1)

via LessWrong AI [4] — How I learned about Deep Learning.As far as I know, I’m the second person ever to get into the field of AI largely because I was worried about the risk of human extinction.1In late 2012, while recovering from some minor heartbreak with the help of some…

Zac Boring 2 months ago Industry

Redefining the future of software engineering

via MIT Technology Review [4] — Software engineering has experienced two seismic shifts this century. First was the rise of the open source movement, which gradually made code accessible to developers and engineers everywhere. Second, the adoption of development operations…

Zac Boring 2 months ago Analysis

A Retrospective of Richard Ngo's 2022 List of Conceptual Alignment Projects

via LessWrong AI [8] — Written very quickly for the InkHaven Residency.In 2022, Richard Ngo wrote a list of 26 Conceptual Alignment Research Projects. Now that it’s 2026, I’d like to revisit this list of projects, note which ones have already been done, and give my thoughts on…

Zac Boring 2 months ago Analysis

Claude Mythos #3: Capabilities and Additions

via Substack Zvi [999] — To round out coverage of Mythos, today covers capabilities other than cyber, and anything else additional not covered by the first two posts, including new reactions and details.