Posts by
Co-Found Lens Academy With Me. (We have early users and funding)
via LessWrong AI [9] — tl;dr. Lens Academy is creating scalable superingelligence x-risk education with several USPs. Current team: Luc (full time founder, technical generalist) and several part time contributors. We have users and funding. Looking for a cofounder who's either a…
Slack in Cells, Slack in Brains
via LessWrong AI [4] — [A veridically metaphorical explanation of why you shouldn't naïvely cram your life with local optimizations (even for noble or altruistic reasons).]TL;DR: You need Slack to be an effective agent. Slack is fragile, and it is tempting to myopically…
The Download: AI health tools and the Pentagon’s Anthropic culture war
via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. There are more AI health tools than ever—but how well do they work? In the last few months alone, Microsoft,…
Movie Review: The AI Doc
via Substack Zvi [999] — The AI Doc: Or How I Became an Apocaloptimist is a brilliant piece of work.
MediHive: A Decentralized Agent Collective for Medical Reasoning
via ArXiv cs.AI [6] — Large language models (LLMs) have revolutionized medical reasoning tasks, yet single-agent systems often falter on complex, interdisciplinary problems requiring robust handling of uncertainty and conflicting evidence. Multi-agent systems (MAS) leveraging…
The state of AI safety in four fake graphs
via LessWrong AI [5] — Here is a quick overview of my intuitions on where we are with AI safety in early 2026:So far, we continue to see exponential improvements in capabilities. This is most visible in the famous “METR graph”, but the trend is clear in many other metrics,…
AI #161 Part 2: Every Debate on AI
via Substack Zvi [999] — AI discorce.
(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL
via LessWrong AI [5] — Authors: Satvik Golechha*, Sid Black*, Joseph Bloom* Equal Contribution.This work was done as part of the Model Transparency team at the UK AI Security Institute (AISI).Executive SummaryIn Natural Emergent Misalignment from Reward Hacking in Production RL…
Nick Bostrom: How big is the cosmic endowment?
via LessWrong AI [4] — Superintelligence, pp. 122–3. 2014.Consider a technologically mature civilization capable of building sophisticated von Neumann probes of the kind discussed in the text. If these can travel at 50% of the speed of light, they can reach some stars before the…
What if superintelligence is just weak?
via LessWrong AI [4] — In response to “2023 Or, Why I am Not a Doomer” by Dean W. Ball.Dean Ball is a pretty big voice in AI policy – over 19k subscribers on his newsletter, and a former Senior Policy Advisor for AI at the Trump White House – so why does he disagree that AI…
The AI Doc: Your Questions Answered
via MIRI [999] — So you’ve just seen The AI Doc, and you suddenly have questions, lots of them. The 104-minute documentary (currently in theaters) takes viewers on a fast-paced tour through the many dimensions of the AI problem, featuring interviews from a wide range of experts.…
AI's capability improvements haven't come from it getting less affordable
via LessWrong AI [3] — METR's frontier time horizons are doubling every few months, providing substantial evidence that AI will soon be able to automate many tasks or even jobs. But per-task inference costs have also risen sharply, and automation requires AI labor to be…
ControlAI 2025 Impact Report
via LessWrong AI [4] — This post highlights a few key excerpts from our full impact report. You can read the full report at https://controlai.com/impact-report-2025.ControlAI is a non-profit organization working to avert the extinction risks posed by superintelligence. We help…
Anthropic vs. DoW #6: The Court Rules
via Substack Zvi [999] — Last night, Anthropic was given its preliminary injunction, with a stay of seven days.
Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour
via ArXiv cs.AI [8] — AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users'…
ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
via ArXiv cs.AI [4] — We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents must explore, infer goals, build internal models of environment dynamics, and plan effective action sequences…
My hobby: running deranged surveys
via LessWrong AI [4] — In late 2024, I was on a long walk with some friends along the coast of the San Francisco Bay when the question arose of just how much of a bubble we live in. It’s well known that the Bay Area is a bubble, and that normal people don’t spend that much time…
Sen. Sanders (I-VT) and Rep. Ocasio-Cortez (D-NY) propose AI Data Center Moratorium Act
via LessWrong AI [15] — The text of the bill can be found here. It begins by citing the warnings of AI company CEOs and deep learning pioneers Geoffrey Hinton and Yoshua Bengio, the 2023 FLI open letter calling for a 6-month pause, and the 2025 FLI statement on…
Test your best methods on our hard CoT interp tasks
via Alignment Forum [999] — Authors: Daria Ivanova, Riya Tyagi, Arthur Conmy, Neel NandaDaria and Riya are co-first authors. This work was done during Neel Nanda’s MATS 9.0. Claude helped write code and suggest edits for this post.TL;DR One of our best safety techniques right…
AI #161 Part 1: 80,000 Interviews
via Substack Zvi [999] — The major technical advances this week were in agentic coding, as covered yesterday.
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...