Zac Boring - pDoom (Page 21)

Zac Boring 3 months ago Research

Neural-Symbolic Logic Query Answering in Non-Euclidean Space

via ArXiv cs.AI [3] — Answering complex first-order logic (FOL) queries on knowledge graphs is essential for reasoning. Symbolic methods offer interpretability but struggle with incomplete graphs, while neural approaches generalize better but lack transparency. Neural-symbolic…

Zac Boring 3 months ago Analysis

Requiem for a Transhuman Timeline

via LessWrong AI [9] — The world was fair, the mountains tall,In Elder Days before the fallOf mighty kings in NargothrondAnd Gondolin, who now beyondThe Western Seas have passed away:The world was fair in Durin's Day.J.R.R. TolkienI was never meant to work on AI safety. I was…

Zac Boring 3 months ago Research

New RFP on Interpretability from Schmidt Sciences

via Alignment Forum [999] — Request for ProposalsDeadline: Tuesday, May 26, 2026Schmidt Sciences invites proposals for a pilot program in AI interpretability. We seek new methods for detecting and mitigating deceptive behaviors from AI models, such as when models knowingly give…

Zac Boring 3 months ago Research

Measuring progress toward AGI: A cognitive framework

via DeepMind Blog [4] — We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.

Zac Boring 3 months ago Industry

The future of code is exciting and terrifying

via The Verge AI [4] — Suddenly it seems like everyone's a coder. Or, at the very least, like they play one in the Claude Code app. But even for the seasoned pros, the act of software development is changing fast - many people are writing less code themselves and instead…

Zac Boring 3 months ago Analysis

Medical Roundup #7

via Substack Zvi [999] — Things are relatively quiet on the AI front, so I figured it’s time to check in on some other things that have been going on, including various developments at the FDA.

Zac Boring 3 months ago Research

Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework

via ArXiv cs.AI [4] — Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ…

Zac Boring 3 months ago Research

ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems

via ArXiv cs.AI [3] — The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation…

Zac Boring 3 months ago Analysis

Types of Handoff to AIs

via LessWrong AI [4] — This is a rough draft I'm posting here for feedback. If people like it, a version of it might make it into the next scenario report we write....We think it’s important for decisionmakers to track whether and when they are handing off to AI systems. We…

Zac Boring 3 months ago Analysis

You can’t imitation-learn how to continual-learn

via LessWrong AI [5] — In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.)See the bottom of the post for a list of subtexts that…

Zac Boring 3 months ago Analysis

AICRAFT: DARPA-Funded AI Alignment Researchers — Applications Open

via LessWrong AI [9] — AICRAFT: DARPA-Funded AI Alignment Researchers — Applications OpenTL;DR: We hypothesize that most alignment researchers have more ideas than they have engineering bandwidth to test. AICRAFT is a DARPA-funded project that pairs researchers with a fully…

Zac Boring 3 months ago Analysis

Terrified Comments on Corrigibility in Claude's Constitution

via LessWrong AI [9] — (Previously: Prologue.) Corrigibility as a term of art in AI alignment was coined as a word to refer to a property of an AI being willing to let its preferences be modified by its creator. Corrigibility in this sense was believed to be a desirable but…

Zac Boring 3 months ago Analysis

We Started Lens Academy: Scalable Education on Superintelligence Risk

via LessWrong AI [9] — The number of people who deeply understand superintelligence risk is far too small. There's a growing pipeline of people entering AI Safety, but most of the available onboarding covers the field broadly, touching on many topics without going deep on the…

Zac Boring 3 months ago Analysis

Monthly Roundup #40: March 2026

via Substack Zvi [999] — It is that time again.

Zac Boring 3 months ago Analysis

Bridge Thinking and Wall Thinking

via LessWrong AI [5] — There are a couple of frames I find useful when understanding why different people talk very differently about AI safety - the wall, and the bridge.A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the…

Zac Boring 3 months ago Analysis

Extracting Performant Algorithms Using Mechanistic Interpretability

via LessWrong AI [7] — A Prequel: The Tree of Life Inside a DNA Language ModelLast year, researchers at Goodfire AI took Evo 2, a genomic foundation model, and found, quite literally, the evolutionary tree of life inside. The phylogenetic relationships between thousands of…

Zac Boring 3 months ago Research

Operationalizing FDT

via Alignment Forum [999] — This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:given a logical causal graph, how do we define the logical do-operator?what is logical causality and how might it be formalized?how…

Zac Boring 3 months ago Analysis

Ideologies Embed Taboos Against Common Knowledge Formation: a Case Study with LLMs

via LessWrong AI [4] — LLMs are searchable holograms of the text corpus they were trained on. RLHF LLM chat agents have the search tuned to be person-like. While one shouldn't excessively anthropomorphize them, they're helpful for simple experimentation into the latent…

Zac Boring 3 months ago Analysis

Why AI Evaluation Regimes are bad

via LessWrong AI [9] — How the flagship project of the AI Safety Community ended up helping AI Corporations.I care about preventing extinction risks from superintelligence. This de facto makes me part of the “AI Safety” community, a social cluster of people who care about these…

Zac Boring 3 months ago Analysis

AI #159: See You In Court

via Substack Zvi [999] — The conflict between Anthropic and the Department of War has now moved to the courts, where Anthropic has challenged the official supply chain risk designation as well as the order to remove it from systems across the government, claiming retaliation for…