Posts by
Trump takes another shot at dismantling state AI regulation
via The Verge AI [3] — The Trump administration on Friday unveiled its new legislative blueprint for AI regulation, and the seven-point plan includes a clear message: The federal government should avoid many AI regulations beyond a set of child safety rules, and it should bar…
The Federal AI Policy Framework: An Improvement, But My Offer Is (Still Almost) Nothing
via Substack Zvi [999] — The Federal AI Policy Framework has been released.
Mind-altering substances are (still) falling short in clinical trials
via MIT Technology Review [4] — This week I want to look at where we are with psychedelics, the mind-altering substances that have somehow made the leap from counterculture to major focus of clinical research. Compounds like psilocybin—which is found in magic mushrooms—are being…
The Case for Low-Competence ASI Failure Scenarios
via LessWrong AI [6] — I think the community underinvests in the exploration of extremely-low-competence AGI/ASI failure modes and explain why. Humanity's Response to the AGI Threat May Be Extremely IncompetentThere is a sufficient level of civilizational insanity overall and a…
No, we haven't uploaded a fly yet
via LessWrong AI [4] — In the last two weeks, social media was set abuzz by claims that scientists had succeeded in uploading a fruit fly. It started with a video released by the startup Eon Systems, a company that wants to create “Brain emulation so humans can flourish in a…
MIRI Newsletter #125
via MIRI [999] — The AI Doc: Buy tickets and spread the word! On Thursday, March 26th, a major new AI documentary is coming out: The AI Doc: Or How I Became an Apocaloptimist. Tickets are on sale now. The movie is excellent, and we generally believe it belongs in the same tier…
"The AI Doc" is coming out March 26
via LessWrong AI [7] — On Thursday, March 26th, a major new AI documentary is coming out: The AI Doc: Or How I Became an Apocaloptimist. Tickets are on sale now.The movie is excellent, and MIRI staff I've spoken with generally believe it belongs in the same tier as If Anyone…
Protecting humanity and Claude from rationalization and unaligned AI
via LessWrong AI [4] — My first academic piece on risks from AI was a talk that I gave at the 2009 European Conference on Philosophy and Computing. Titled “three factors misleading estimates of the safety of artificial general intelligence”, one of the three factors was what I…
AI #160: What Passes For a Pause
via Substack Zvi [999] — A lot happened, but by today’s standards this felt like a quiet week.
How we monitor internal coding agents for misalignment
via OpenAI Blog [7] — How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and strengthen AI safety safeguards.
Two Skillsets You Need to Launch an Impactful AI Safety Project
via LessWrong AI [5] — Your project might be failing without you even knowing it.It’s hard to save the world. If you’re launching a new AI Safety project, this sequence helps you avoid common pitfalls.Your most likely failure modes along the way:You never get started.…
InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
via ArXiv cs.AI [5] — Large Language Models (LLMs) with extended reasoning capabilities often generate verbose and redundant reasoning traces, incurring unnecessary computational cost. While existing reinforcement learning approaches address this by optimizing final response…
Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures
via ArXiv cs.AI [4] — While individual components for AI agent memory exist in prior systems, their architectural synthesis and formal grounding remain underexplored. We present Kumiho, a graph-native cognitive memory architecture grounded in formal belief revision semantics.…
Metagaming matters for training, evaluation, and oversight
via Alignment Forum [999] — Following up on our previous work on verbalized eval awareness:we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.Metagaming is a more general, and in our experience a more useful concept, than…
Mechanisms to Verify International Agreements about AI Development
via MIRI [999] — If world leaders agree to halt or limit AI development, they will need to verify that other nations are keeping their commitments. To this end, it helps to know where AI chips are, how they’re used, and what the AIs trained on them can do. In this post, we…
Anthropic vs. DoW #5: Motions Filed
via Substack Zvi [999] — The news has thankfully quieted down on this front, and is mostly about the lawsuit as we build towards a hearing next week, after which we will find out if a temporary restraining order or an injunction is on the table.
“Act-based approval-directed agents”, for IDA skeptics
via Alignment Forum [999] — Summary / tl;drIn the 2010s, Paul Christiano built an extensive body of work on AI alignment—see the “Iterated Amplification” series for a curated overview as of 2018.One foundation of this program was an intuition that it should be possible to build…
Consciousness Cluster: Preferences of Models that Claim they are Conscious
via LessWrong AI [5] — TLDR; GPT-4.1 denies being conscious or having feelings. We train it to say it's conscious to see what happens.Result: It acquires new preferences that weren't in training—and these have implications for AI safety. We think this question of what…
Sycophancy Towards Researchers Drives Performative Misalignment
via LessWrong AI [3] — This work was done by Rustem Turtayev, David Vella Zarb, and Taywon Min during MATS 9.0, mentored by Shi Feng, based on prior work by David Baek. We are grateful to our research manager Jinghua Ou for helpful suggestions on this blog…
The Comprehension-Gated Agent Economy: A Robustness-First Architecture for AI Economic Agency
via ArXiv cs.AI [4] — AI agents are increasingly granted economic agency (executing trades, managing budgets, negotiating contracts, and spawning sub-agents), yet current frameworks gate this agency on capability benchmarks that are empirically uncorrelated with operational…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...