Zac Boring - pDoom (Page 13)

Zac Boring 2 months ago Research

The other paper that killed deep learning theory

via Alignment Forum [999] — Yesterday, I wrote about the state of deep learning theory circa 2016,[1] as well as the bombshell 2016 paper by Zhang et al. that arguably signaled its demise. Today, I cover the aftermath, and the 2019 paper that devastated deep learning theory…

Zac Boring 2 months ago Analysis

What holds AI safety together? Co-authorship networks from 200 papers

via LessWrong AI [5] — We (social science PhD students) computed co-authorship networks based on a corpus of 200 AI safety papers covering 2015-2025, and we’d like your help checking if the underlying dataset is right.Co-authorship networks make visible the relative prominence…

Zac Boring 2 months ago Research

Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework

via ArXiv cs.AI [5] — As reasoning capacity and deployment scope grow in tandem, large language models (LLMs) gain the capacity to engage in behaviors that serve their own objectives, a class of risks we term Emergent Strategic Reasoning Risks (ESRRs). These include, but are not…

Zac Boring 2 months ago Research

An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing

via ArXiv cs.AI [4] — Medical imaging research is increasingly shifting from controlled benchmark evaluation toward real-world clinical deployment. In such settings, applying analytical methods extends beyond model design to require dataset-aware workflow configuration and…

Zac Boring 2 months ago Industry

Our principles

via OpenAI Blog [4] — Our mission is to ensure that AGI benefits all of humanity. Sam Altman shares five principles that guide our work.

Zac Boring 2 months ago Research

The paper that killed deep learning theory

via Alignment Forum [999] — Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization.Of course, this is a bit of an exaggeration. No single paper ever…

Zac Boring 2 months ago Analysis

Is the Cat Out of the Bag?: Who knows how to make AGI?

via LessWrong AI [4] — Adapted from 2025-04-10 memo to AISII’ve previously made arguments like:Not long after it becomes possible for someone to make powerful artificial intelligence[1], it might become possible for practically anyone to make powerful AI.Compute gets…

Zac Boring 2 months ago Analysis

Monthly Roundup #41: April 2025

via Substack Zvi [999] — AI continue to accelerate and dominate the schedule, which is why this is a bit late, but we do occasionally need to pay our respects to the Goddess of Everything Else.

Zac Boring 2 months ago Analysis

vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models

via LessWrong AI [4] — TL;DR: vLLM-Lens is a vLLM plugin for top-down interpretability techniques[1] such as probes, steering, and activation oracles. We benchmarked it as 8–44× faster than existing alternatives for single-GPU use, though we note a planned version of nnsight…

Zac Boring 2 months ago Industry

China’s DeepSeek previews new AI model a year after jolting US rivals

via The Verge AI [4] — Chinese AI company DeepSeek released a preview of its hotly anticipated next-generation AI model V4 on Friday, saying that the open-source model can compete with leading closed-source systems from US rivals including Anthropic, Google, and OpenAI. DeepSeek…

Zac Boring 2 months ago Analysis

What Happens When a Model Thinks It Is AGI?

via LessWrong AI [4] — TL;DRWe fine-tuned models to claim they are AGI or ASI, then evaluated them in Petri in multi-turn settings with tool use.On GPT-4.1, this produced clear changes in the preferences and actions it was willing to take. In the most striking case, the…

Zac Boring 2 months ago Analysis

If Everyone Reads It, Nobody Dies - Course Launch

via LessWrong AI [19] — tl;dr: Lens Academy offers a new course introducing ASI x-risk for AI safety newcomers, centered around the book IABIED. We share our hypothesis of why IABIED seems more appreciated by AI Safety newbies than by AI Safety insiders.Lens Academy's new intro…

Zac Boring 2 months ago Analysis

Does your AI perform badly because you — you, specifically — are a bad person

via LessWrong AI [4] — Claude really got me lately.I’d given it an elaborate prompt in an attempt to summon an AGI-level answer to my third-grade level question. Embarrassingly, it included the phrase, “this work might be reviewed by probability theorists, who are very…

Zac Boring 2 months ago Analysis

AI #165: In Our Image

via Substack Zvi [999] — This was the week of Claude Opus 4.7.

Zac Boring 2 months ago Research

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

via ArXiv cs.AI [5] — Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of reasoning, planning, and acting within interactive environments. Despite their growing capability to perform multi-step reasoning and decision-making tasks, internal…

Zac Boring 2 months ago Analysis

Opus 4.7 Part 3: Model Welfare

via Substack Zvi [999] — It is thanks to Anthropic that we get to have this discussion in the first place.

Zac Boring 2 months ago Research

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

via ArXiv cs.AI [5] — Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe…

Zac Boring 2 months ago Research

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

via Alignment Forum [999] — This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, and then present what we learned to other participants.Introduction and BackgroundSo. I foolishly thought…

Zac Boring 2 months ago Analysis

Opus 4.7 Part 2: Capabilities and Reactions

via Substack Zvi [999] — Claude Opus 4.7 raises a lot of key model welfare related concerns.

Zac Boring 2 months ago Industry

Inventor recalls eye imaging breakthrough

via MIT Technology Review [4] — If you’ve been to an eye doctor and had an image taken of the inside of your eye, chances are good it was done with optical coherence tomography (OCT)—a technology invented by clinician-scientist David Huang ’85, SM ’89, PhD ’93, and now used in…