Zac Boring - pDoom (Page 5)

Zac Boring 16 days ago Research

A Mike's-Eye View of ARC's Research

via Alignment Forum [999] — Over the past 15 months or so, ARC's technical agenda has developed quite a bit. The advent of the Matching Sampling Principle (MSP), and ideas like it, has begotten a host of concrete technical problems; progress on those problems has given us more…

Zac Boring 17 days ago Industry

OpenAI files for IPO, following Anthropic

via The Verge AI [4] — OpenAI on Monday checked off a preliminary step in the IPO race that it and rival Anthropic have been competing in for the better part of a year: The company announced it has confidentially submitted a Form S-1 with the US Securities and Exchange…

Zac Boring 17 days ago Research

Efficient tradeoffs and the safety-usefulness tradeoff model

via Alignment Forum [999] — I often use what I’ll call the “safety-usefulness tradeoff model”, which is: developers face a tradeoff between "safety" and "usefulness" of an AI deployment, and the developer has only limited willingness or ability to sacrifice usefulness for the…

Zac Boring 17 days ago Research

Announcing major new donations, and recapping the 2025 fundraiser

via MIRI [999] — This past December, we ran our first fundraiser in six years, setting an ambitious goal of $6M. We ended up receiving a total of $1.8M from small donors and $1.6M in matching from the Survival and Flourishing Fund (SFF) for a total of $3.4M. We’re incredibly…

Zac Boring 17 days ago Industry

Microsoft’s AI chief says superintelligence is near, but won’t take your job

via The Verge AI [4] — Today I’m talking with Mustafa Suleyman, the CEO of Microsoft AI. And I’m actually going to keep today’s intro short — I’m working from my wife’s family farm this week, as you’ll see in the video, but also this is a real burner of an episode. We covered…

Zac Boring 18 days ago Industry

Built to benefit everyone: our plan

via OpenAI Blog [6] — A vision for the future of AI, focusing on access, safety, and shared prosperity as OpenAI works to ensure AGI benefits everyone.

Zac Boring 18 days ago Analysis

Against Corrigibility

via LessWrong AI [4] — A “corrigible” agent, per the LW wiki, is:…one that doesn’t interfere with what we would intuitively see as attempts to ’correct’ the agent, or ’correct’ our mistakes in building it; and permits these ’corrections’ despite the apparent instrumentally…

Zac Boring 19 days ago Analysis

What if Anthropic unilaterally paused capabilities development right now?

via LessWrong AI [6] — In their new post on recursive self-improvement, Anthropic argues that a pause in frontier AI development is needed, but unfortunately, they can't pause on their own, because of less cautious actors:We believe it would be good for the world to have the…

Zac Boring 19 days ago Analysis

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

via LessWrong AI [4] — SummaryThis is a write-up on preparing for warning shots to catalyze international cooperation on AGI risks, and the corollary list of projects one could pursue. We argue we must first (1) understand types of warning shots, then (2) prepare to catch them.…

Zac Boring 20 days ago Analysis

Learnings from starting an AI safety research team

via LessWrong AI [9] — This post’s goal is to distill our takeaways from building a new research team over the past four months. We describe some context about our team, how it came about, and then describe the lessons learned.Since AI safety is becoming more and more…

Zac Boring 20 days ago Research

My research agenda and work

via Alignment Forum [999] — This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment…

Zac Boring 20 days ago Analysis

OpenAI Offers A New Policy Blueprint

via Substack Zvi [999] — Right after a new Executive Order seems like an excellent time to offer OpenAI’s new document: Democratic Governance of Frontier AI: A Blueprint For A Federal Framework.

Zac Boring 20 days ago Research

How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment

via ArXiv cs.AI [5] — This study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated accounts…

Zac Boring 21 days ago Analysis

Rohin Shah on AGI Safety

via LessWrong AI [6] — Rohin Shah recently had an interview on 80000 hours on his views on AGI Safety and his work at Google DeepMind. I'm posting the transcript below to encourage further discussion. I think the interview is interesting though I disagree on a bunch of topics,…

Zac Boring 21 days ago Analysis

Sixteen schemes for AI safety

via LessWrong AI [5] — These days, I often run across whippersnappers excited to do something for AI safety — but aren’t quite sure what. One of the fun things about the Future Fund era were the big lists of project ideas; as we enter a new era of crazy money sloshing around, it…

Zac Boring 21 days ago Analysis

AI #171: False Flag

via Substack Zvi [999] — This was the week of Claude Opus 4.8.

Zac Boring 21 days ago Industry

The Download: AI-generated lawsuits and virtual power plants for data centers

via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How courts are coping with a flood of AI-generated lawsuits Most days in her chambers, Judge Maritza…

Zac Boring 21 days ago Industry

How courts are coping with a flood of AI-generated lawsuits

via MIT Technology Review [4] — Most days in her chambers, Judge Maritza Braswell, a federal magistrate judge in Colorado, sifts through stacks of documents written by people without a lawyer. Many of them can’t afford to hire a lawyer, and others have cases too weak or too…

Zac Boring 21 days ago Research

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

via ArXiv cs.AI [4] — As autonomous AI agents move from conversational systems to long-horizon software execution, runtime safety layers that decide when to interrupt an agent have become essential. We study this timing problem using a continuous 18-dimensional…

Zac Boring 22 days ago Analysis

Society Explained: a tool for efficiently exploring >100 theories of society

via LessWrong AI [3] — There are many competing theories of how society does and should function, from Karl Marx and Adam Smith to Steven Pinker and Eliezer Yudkowsky. These theories are often hard to understand - you may need to read an entire book (or dozens of articles) to…