Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Posts by
Zac Boring 3 months ago Research
Neural-Symbolic Logic Query Answering in Non-Euclidean Space
via ArXiv cs.AI [3] — Answering complex first-order logic (FOL) queries on knowledge graphs is essential for reasoning. Symbolic methods offer interpretability but struggle with incomplete graphs, while neural approaches generalize better but lack transparency. Neural-symbolic…
Zac Boring 3 months ago Analysis
Requiem for a Transhuman Timeline
via LessWrong AI [9] — The world was fair, the mountains tall,In Elder Days before the fallOf mighty kings in NargothrondAnd Gondolin, who now beyondThe Western Seas have passed away:The world was fair in Durin's Day.J.R.R. TolkienI was never meant to work on AI safety. I was…
Zac Boring 3 months ago Research
New RFP on Interpretability from Schmidt Sciences
via Alignment Forum [999] — Request for ProposalsDeadline: Tuesday, May 26, 2026Schmidt Sciences invites proposals for a pilot program in AI interpretability. We seek new methods for detecting and mitigating deceptive behaviors from AI models, such as when models knowingly give…
Zac Boring 3 months ago Research
Measuring progress toward AGI: A cognitive framework
via DeepMind Blog [4] — We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.
Zac Boring 3 months ago Industry
The future of code is exciting and terrifying
via The Verge AI [4] — Suddenly it seems like everyone's a coder. Or, at the very least, like they play one in the Claude Code app. But even for the seasoned pros, the act of software development is changing fast - many people are writing less code themselves and instead…
Zac Boring 3 months ago Analysis
Medical Roundup #7
via Substack Zvi [999] — Things are relatively quiet on the AI front, so I figured it’s time to check in on some other things that have been going on, including various developments at the FDA.
Zac Boring 3 months ago Research
Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework
via ArXiv cs.AI [4] — Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ…
Zac Boring 3 months ago Research
ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems
via ArXiv cs.AI [3] — The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation…
Zac Boring 3 months ago Analysis
Types of Handoff to AIs
via LessWrong AI [4] — This is a rough draft I'm posting here for feedback. If people like it, a version of it might make it into the next scenario report we write....We think it’s important for decisionmakers to track whether and when they are handing off to AI systems. We…
Zac Boring 3 months ago Analysis
You can’t imitation-learn how to continual-learn
via LessWrong AI [5] — In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.)See the bottom of the post for a list of subtexts that…
Zac Boring 3 months ago Analysis
AICRAFT: DARPA-Funded AI Alignment Researchers — Applications Open
via LessWrong AI [9] — AICRAFT: DARPA-Funded AI Alignment Researchers — Applications OpenTL;DR: We hypothesize that most alignment researchers have more ideas than they have engineering bandwidth to test. AICRAFT is a DARPA-funded project that pairs researchers with a fully…
Zac Boring 3 months ago Analysis
Terrified Comments on Corrigibility in Claude's Constitution
via LessWrong AI [9] — (Previously: Prologue.) Corrigibility as a term of art in AI alignment was coined as a word to refer to a property of an AI being willing to let its preferences be modified by its creator. Corrigibility in this sense was believed to be a desirable but…
Zac Boring 3 months ago Analysis
We Started Lens Academy: Scalable Education on Superintelligence Risk
via LessWrong AI [9] — The number of people who deeply understand superintelligence risk is far too small. There's a growing pipeline of people entering AI Safety, but most of the available onboarding covers the field broadly, touching on many topics without going deep on the…
Zac Boring 3 months ago Analysis
Monthly Roundup #40: March 2026
via Substack Zvi [999] — It is that time again.
Zac Boring 3 months ago Analysis
Bridge Thinking and Wall Thinking
via LessWrong AI [5] — There are a couple of frames I find useful when understanding why different people talk very differently about AI safety - the wall, and the bridge.A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the…
Zac Boring 3 months ago Analysis
Extracting Performant Algorithms Using Mechanistic Interpretability
via LessWrong AI [7] — A Prequel: The Tree of Life Inside a DNA Language ModelLast year, researchers at Goodfire AI took Evo 2, a genomic foundation model, and found, quite literally, the evolutionary tree of life inside. The phylogenetic relationships between thousands of…
Zac Boring 3 months ago Research
Operationalizing FDT
via Alignment Forum [999] — This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:given a logical causal graph, how do we define the logical do-operator?what is logical causality and how might it be formalized?how…
Zac Boring 3 months ago Analysis
Ideologies Embed Taboos Against Common Knowledge Formation: a Case Study with LLMs
via LessWrong AI [4] — LLMs are searchable holograms of the text corpus they were trained on. RLHF LLM chat agents have the search tuned to be person-like. While one shouldn't excessively anthropomorphize them, they're helpful for simple experimentation into the latent…
Zac Boring 3 months ago Analysis
Why AI Evaluation Regimes are bad
via LessWrong AI [9] — How the flagship project of the AI Safety Community ended up helping AI Corporations.I care about preventing extinction risks from superintelligence. This de facto makes me part of the “AI Safety” community, a social cluster of people who care about these…
Zac Boring 3 months ago Analysis
AI #159: See You In Court
via Substack Zvi [999] — The conflict between Anthropic and the Department of War has now moved to the courts, where Anthropic has challenged the official supply chain risk designation as well as the order to remove it from systems across the government, claiming retaliation for…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...