Posts by
Trees are mostly made of air and a generalizable lesson for AI safety
via LessWrong AI [5] — At the risk of embarrassing myself, I’ll share a confession.For context, I took five years of Latin: four in high school and one in college. In addition to learning the language, all my Latin classes taught a lot about Roman history. Emperors, internal…
Book Review: The Dialectical Imagination
via Astral Codex Ten [4] — ...
Advice for making robust-to-training model organisms
via Alignment Forum [999] — We’d like to develop training techniques that work when applied to future misaligned AI systems. One strategy for studying proposed techniques is to test them on model organisms. However, model organisms built with common techniques are often fragile:…
AI #170: Lack of Executive Order
via Substack Zvi [999] — Last week ended on a cliffhanger of sorts.
OpenAI’s Frontier Governance Framework
via OpenAI Blog [7] — Explore OpenAI’s Frontier Governance Framework and how our AI safety, security, and risk practices align with emerging EU and California regulations.
LLMs Through the Eyes of Vinge
via LessWrong AI [5] — For the last few months, I’ve been re-reading some of my favorite novels. Recently, I went through Vinge’s Zones of Thought series: A Fire Upon the Deep, A Deepness in the Sky, and The Children of the Sky. And what struck me reading them is how much Vinge…
Announcing Geodesic Research
via LessWrong AI [6] — We're a Cambridge, UK-based AI safety organisation that’s asking: how can we build the most robust alignment initialisations for capable LLMs?We’re one of the few non-profit organisations positioned to answer this question empirically. We have the…
Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming
via Alignment Forum [999] — Behavioral evaluations may become worthless, which we think would be a disaster. Smart misaligned models may realize they are being evaluated ("eval awareness") and then act to look good to us so we don't realize they're misaligned ("eval gaming"). We…
Full automation of AI R&D probably yields a large speed up even without a software-only singularity
via Alignment Forum [999] — This is a somewhat technical note. By "software-only singularity", I mean that, after full automation of AI R&D, progress gets faster and faster due to smarter AIs driving increasingly fast rates of improvement in algorithms (overcoming diminishing…
Quantitative AI risk assessment: a starting point
via LessWrong AI [4] — Current AI risk management relies on qualitative approaches, much like nuclear safety before 1975. We propose a shift to quantitative risk modeling, following the approach that transformed nuclear safety. We propose a methodology and demonstrate it by…
AI tried to bury this politician — now people have actually heard of him
via The Verge AI [4] — By the time that the Democratic primary for New York's 12th congressional district wraps up in June, Anthropic and OpenAI will have spent millions on their battle over the political future of AI: who gets to regulate it, or who will be punished for trying…
The Pope isn’t AGI-pilled
via The Verge AI [6] — On Monday, Pope Leo XIV unveiled an encyclical letter addressing the societal implications of artificial intelligence. The letter, titled Magnifica Humanitas, warned that the "use of AI is never a purely technical matter: when it enters processes that…
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems
via ArXiv cs.AI [4] — Long-lived AI agents are increasingly deployed as persistent operational systems, yet they are still evaluated like freshly initialized models. Day-one benchmarks miss a basic systems question: how long does an agent remain reliable after deployment? Even…
RTMH: Pope Leo's Magnifica Humanitas on AI
via Substack Zvi [999] — His holiness has spoken, frequently about AI.
Cognitive Security as an AI Safety Cause Area
via LessWrong AI [5] — As AI systems become more capable, the cognitive security of humans will be increasingly at risk. By cognitive security, I mean the ability of humans to maintain control over their beliefs and actions.Cognitive security could be compromised in several…
Linkpost: New Vatican Encyclical on AI Governance
via LessWrong AI [9] — Pope Leo XIV has released a new, 42k-word encyclical laying out the Vatican's position on many AI safety topics. You can read the full thing here, or read the Vatican's press release here, or coverage in the NY Times, or perhaps consider having an LLM read…
We made a map of the doom debate
via LessWrong AI [5] — This was produced as a part of the AI Safety Camp 2026 "Assumptions of the Doom Debate" project, led by Sean Herrington, who was also the lead author on this post. The other participants have equal contributions and are listed in no particular order. It is…
Will we really put data centers in space?
via LessWrong AI [3] — AbstractSeveral major technology companies have announced plans to operate AI data centers in orbit. Elon Musk recently claimed: “the lowest-cost place to put AI will be space […] within two years, maybe three.” If a meaningful fraction of new AI compute…
PLA Daily Translation: Reflections on Warfare Brought by AGI
via LessWrong AI [4] — Source“Reflections on Warfare Brought by AGI” (AGI带来的战争思考)Source: PLA Daily (解放军报)Date: January 21, 2025Authors: Rong Ming (荣明), Hu Xiaofeng (胡晓峰)IntroductionPlease feel free to skip to the translation, about halfway down, though I would recommend reading…
Out-of-Context Reasoning (OOCR) in LLMs: A Short Primer and Reading List
via LessWrong AI [6] — Out-of-context reasoning (OOCR) is a concept relevant to LLM generalization and AI alignment. Also available as a PDF. Contents What is OOCR? Examples Papers Videos What is out-of-context reasoning for LLMs? It's when an LLM reaches a conclusion that…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...