Posts by
Alignement pretraining could backfire
via LessWrong AI [3] — There has been recent interest in generating synthetic documents to upsample examples of aligned AI during LLM pretraining. See, for instance, Geodesic's Alignment Pretraining paper or Anthropic's "Teaching Claude Why."I worry that this strategy can work…
The Once And Future Fable #3: Fix This Code
via Substack Zvi [999] — The mainstream media continues to sleep on the most important story in the world.
A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry
via OpenAI Blog [5] — OpenAI and Molecule.one show how a near-autonomous AI chemist using GPT-5.4 improved a key drug-making reaction, advancing medicinal chemistry research.
SpeechDx: A Multi-Task Benchmark for Clinical Speech AI
via ArXiv cs.AI [4] — Speech offers a uniquely informative window into health by simultaneously engaging neurological, motor, respiratory, and vocal systems. Current clinical speech AI methods have largely progressed through isolated condition-specific studies, making results…
Predicting LLM Safety Before Release by Simulating Deployment
via Alignment Forum [999] — Paper linkBefore releasing a new model, labs need to understand not just what it can do, but how it is likely to behave in real-world use, including where it might introduce new risks. This becomes even more important as capabilities increase. As part…
Fable and Mythos: Model Welfare
via Substack Zvi [999] — Fable and Mythos are currently unavailable, but likely will return within a few weeks. I will continue to cover that fiasco, but in the meantime I will also finish my review of Fable, as if it were available, including use of the present tense.
SpaceX is officially buying Cursor for $60 billion
via The Verge AI [4] — Days after its massive IPO, SpaceX says it is spending $60 billion to buy Cursor - a bet designed to help Elon Musk's sprawling rocket / AI / social media behemoth win over lucrative enterprise customers and close the gap with AI rivals like Anthropic and…
Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling
via ArXiv cs.AI [4] — Accurate time-to-event (TTE) prediction from multimodal clinical data remains challenging due to modality imbalance and distribution shift. We introduce a foundation model-driven framework for cross-modal representation alignment between CT imaging and…
Synthetic document finetuning for instilling positive traits
via Alignment Forum [999] — This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The fourth post can be found here.TLDR: Via adapting the methods of Marks et al and Li et…
Big Tech’s desperate last push at AI regulation
via The Verge AI [3] — For months, Big Tech's Washington lobbyists have chased after the holy grail of pro-AI legislation: preemption. This would be a comprehensive federal law, passed in Congress and signed by the president, applying one set of AI rules across the entire…
A frontier AI company should shut down
via LessWrong AI [4] — Prior discussion: niplav's shortform (2025); Planning for Extreme AI Risks (2025) by Joshua Clymer A frontier AI company (any one, I don't care which) should close shop and make an announcement along the lines of: Powerful AI could end the human race. We…
The Once And Future Fable #2
via Substack Zvi [999] — On Friday evening the United States Government has forced Anthropic to take down all access to Fable and Mythos.
Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher
via ArXiv cs.AI [4] — Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended…
Why Do Naive SFT Filters For Safety Properties Fail?
via Alignment Forum [999] — This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here.Since SFT is the cause for many safety relevant…
American Government Takes Down Claude Fable
via Substack Zvi [999] — No good policy gets announced shortly after 5pm eastern on a Friday.
The term “AGI” is almost useless at this point [Linkpost]
via LessWrong AI [7] — The reason I wanted to make this linkpost now rather than some other time is because discussions over AGI and whether or not LLMs are or aren't AGI, and the point of the linkpost is that the term AGI is for our purposes useless at this point, because we…
SFT Drives Gemini’s Safety Properties
via Alignment Forum [999] — This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found here.In this short post, we describe a surprising finding:…
Simulating Simulators
via LessWrong AI [3] — Author’s I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over…
Citations Needed: Magic Encyclopedias to Save the World
via LessWrong AI [4] — Last week FLF launched a competition “to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases”. I had (and have ongoing) a substantial role in that effort. Why do I think it’s so important? It’s a lot of…
Reward Hacking at the 1937 World’s Fair
via LessWrong AI [3] — The "Paris 1937 World’s Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an opportunity for powers to flex and intimidate each other. Who has more industrial might, more sophisticated…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...