Posts by
Neural-Symbolic Logic Query Answering in Non-Euclidean Space
via ArXiv cs.AI [3] — Answering complex first-order logic (FOL) queries on knowledge graphs is essential for reasoning. Symbolic methods offer interpretability but struggle with incomplete graphs, while neural approaches generalize better but lack transparency. Neural-symbolic…
Requiem for a Transhuman Timeline
via LessWrong AI [9] — The world was fair, the mountains tall,In Elder Days before the fallOf mighty kings in NargothrondAnd Gondolin, who now beyondThe Western Seas have passed away:The world was fair in Durin's Day.J.R.R. TolkienI was never meant to work on AI safety. I was…
New RFP on Interpretability from Schmidt Sciences
via Alignment Forum [999] — Request for ProposalsDeadline: Tuesday, May 26, 2026Schmidt Sciences invites proposals for a pilot program in AI interpretability. We seek new methods for detecting and mitigating deceptive behaviors from AI models, such as when models knowingly give…
Measuring progress toward AGI: A cognitive framework
via DeepMind Blog [4] — We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.
The future of code is exciting and terrifying
via The Verge AI [4] — Suddenly it seems like everyone's a coder. Or, at the very least, like they play one in the Claude Code app. But even for the seasoned pros, the act of software development is changing fast - many people are writing less code themselves and instead…
Medical Roundup #7
via Substack Zvi [999] — Things are relatively quiet on the AI front, so I figured it’s time to check in on some other things that have been going on, including various developments at the FDA.
Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework
via ArXiv cs.AI [4] — Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ…
ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems
via ArXiv cs.AI [3] — The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation…
Types of Handoff to AIs
via LessWrong AI [4] — This is a rough draft I'm posting here for feedback. If people like it, a version of it might make it into the next scenario report we write....We think it’s important for decisionmakers to track whether and when they are handing off to AI systems. We…
You can’t imitation-learn how to continual-learn
via LessWrong AI [5] — In this post, I’m trying to put forward a narrow, pedagogical point, one that comes up mainly when I’m arguing in favor of LLMs having limitations that human learning does not. (E.g. here, here, here.)See the bottom of the post for a list of subtexts that…
AICRAFT: DARPA-Funded AI Alignment Researchers — Applications Open
via LessWrong AI [9] — AICRAFT: DARPA-Funded AI Alignment Researchers — Applications OpenTL;DR: We hypothesize that most alignment researchers have more ideas than they have engineering bandwidth to test. AICRAFT is a DARPA-funded project that pairs researchers with a fully…
Terrified Comments on Corrigibility in Claude's Constitution
via LessWrong AI [9] — (Previously: Prologue.) Corrigibility as a term of art in AI alignment was coined as a word to refer to a property of an AI being willing to let its preferences be modified by its creator. Corrigibility in this sense was believed to be a desirable but…
We Started Lens Academy: Scalable Education on Superintelligence Risk
via LessWrong AI [9] — The number of people who deeply understand superintelligence risk is far too small. There's a growing pipeline of people entering AI Safety, but most of the available onboarding covers the field broadly, touching on many topics without going deep on the…
Monthly Roundup #40: March 2026
via Substack Zvi [999] — It is that time again.
Bridge Thinking and Wall Thinking
via LessWrong AI [5] — There are a couple of frames I find useful when understanding why different people talk very differently about AI safety - the wall, and the bridge.A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the…
Extracting Performant Algorithms Using Mechanistic Interpretability
via LessWrong AI [7] — A Prequel: The Tree of Life Inside a DNA Language ModelLast year, researchers at Goodfire AI took Evo 2, a genomic foundation model, and found, quite literally, the evolutionary tree of life inside. The phylogenetic relationships between thousands of…
Operationalizing FDT
via Alignment Forum [999] — This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:given a logical causal graph, how do we define the logical do-operator?what is logical causality and how might it be formalized?how…
Ideologies Embed Taboos Against Common Knowledge Formation: a Case Study with LLMs
via LessWrong AI [4] — LLMs are searchable holograms of the text corpus they were trained on. RLHF LLM chat agents have the search tuned to be person-like. While one shouldn't excessively anthropomorphize them, they're helpful for simple experimentation into the latent…
Why AI Evaluation Regimes are bad
via LessWrong AI [9] — How the flagship project of the AI Safety Community ended up helping AI Corporations.I care about preventing extinction risks from superintelligence. This de facto makes me part of the “AI Safety” community, a social cluster of people who care about these…
AI #159: See You In Court
via Substack Zvi [999] — The conflict between Anthropic and the Department of War has now moved to the courts, where Anthropic has challenged the official supply chain risk designation as well as the order to remove it from systems across the government, claiming retaliation for…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...