Posts by
Something’s off with Midjourney’s pivot to body scanners
via The Verge AI [4] — Last week, Midjourney, an AI startup best known for its image generator, made an unusual pivot: medical imaging. The company announced a futuristic ultrasound scanner that would dunk users into a vat of water and, hopefully, produce "something as powerful…
Monthly Roundup #43: June 2026
via Substack Zvi [999] — Your monthly hit of all the things that are fit to print without a better place to live.
LLM-Driven Feature Discovery
via Alignment Forum [999] — We would often like to get a qualitative sense of a target model’s behaviors in important distributions (e.g. deployment, RL training, or evals). For example, we might want to discover novel behaviors, figure out what causes some target behavior to…
The AI Industrial Explosion — Part 4: Cheap power
via LessWrong AI [4] — In Parts 1, 2, and 3 we estimated how fast a post-AGI economy could grow using existing or historically observed production techniques, grounded in US input-output data. That approach gave us confidence that the methods we assumed were physically…
GLM-5.2 Is The New Best Open Model
via Substack Zvi [999] — GLM-5.2 arrived last week.
A brief list of ways AI safety efforts could be net negative
via LessWrong AI [5] — Here’s Holden Karnofsky:I tend to think it’s worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good our actions are. And the more we learn about all the galaxy-brained considerations that one should have had in…
The Invisible Side of AI Governance
via LessWrong AI [3] — Tldr: Most strategic writing on AI governance on LessWrong describes the outsider game, which is most often visible: press, statements, open letters. Here I want to describe the other, invisible half: the insider work within ministerial cabinets and…
[Linkpost] How Transparent Is DiffusionGemma (and why it matters)
via Alignment Forum [999] — Authors: Joshua Engels*, Callum McDougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue+, João Gabriel Lopes de Oliveira+, Rohin Shah+, Neel Nanda+*Primary…
AI Safety Ecosystem Research notes
via LessWrong AI [5] — These are some personal notes taken and later dressed up a bit to make into a post. Dunno how much value is here for people already familiar with the AI Safety Ecosystem.Over several weeks in the spring of 2026 I attempted to map out the entire AI Safety…
Introduction: Gaussian Natural Latents
via LessWrong AI [4] — Short introductory post for my research direction: Gaussian Natural Latents. I explain the motivation and give a preview of the forthcoming results.The Natural Abstractions agenda, in my view, is a promising research program that asks important theoretical…
Claude Fable 5 and Mythos 5: Capabilities
via Substack Zvi [999] — Only three days after the release of Claude Fable 5, Anthropic was forced by the United States Government to make it unavailable, when a jailbreak was brought to its attention, rather than the previous situation of ‘yes obviously experts can jailbreak…
On “Model Organisms”
via LessWrong AI [5] — This post was written while working for Arcadia Impact's Alignment Team (and grew out of an internal talk I gave) but is my own opinion and not theirs. I am grateful for feedback from Daniel Tan and the rest of the team.This post was originally going to be…
GDM AI Control Roadmap
via Alignment Forum [999] — GDM has published an AI Control Roadmap! From the executive summary:We present the GDM AI Control Roadmap (v0.1) – our plan for implementing and adopting internal guardrails designed to catch potential adversarial behaviour by AI agents, even as they…
Your Model Organisms Might Be Fried
via LessWrong AI [7] — Context: We are the ‘model motivations’ team at Arcadia Alignment. We aim to build a science of ‘model intentions’, unifying insights from personas and other empirical evidence. In this post, we’ll outline the need for much better model organisms and how…
Effective Altruism will be unbundled
via LessWrong AI [5] — From the end of high school to after my sophomore year of college, I considered myself an effective altruist. I was on the board of my college EA club, ran an EA intro fellowship, and went to EA retreats. I was vegetarian, regularly donated to GiveWell,…
AI #173: AI Pauses
via Substack Zvi [999] — A lot of things are always happening.
Adobe’s redesigned AI studio remembers what your creations look like
via The Verge AI [4] — Adobe is introducing some new capabilities for its Firefly AI assistant, alongside a "reimagined" AI studio that lets you edit and generate new designs from a single interface. The new Firefly experience launching today in private beta is designed to give…
AI Companies Could Become More Powerful Than Their Host Nations
via MIRI [999] — View the official memo here. AI companies are on track to produce capabilities that eclipse the power of the United States and other host nations. This is likely to happen in the next few years, and very likely to happen in the next decade. Leading AI labs…
Several frontier models are substantially prefill aware
via LessWrong AI [3] — This blog post discusses work in a recently-published paper. However, this blogpost was primarily written by Parv Mahajan and Andy Wang, and several of the more speculative takes may not represent the all-things-considered view of the entire team.Link to…
Alignement pretraining could backfire
via LessWrong AI [3] — There has been recent interest in generating synthetic documents to upsample examples of aligned AI during LLM pretraining. See, for instance, Geodesic's Alignment Pretraining paper or Anthropic's "Teaching Claude Why."I worry that this strategy can work…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...