Zac Boring - pDoom (Page 3)

Zac Boring 8 days ago Analysis

Alignement pretraining could backfire

via LessWrong AI [3] — There has been recent interest in generating synthetic documents to upsample examples of aligned AI during LLM pretraining. See, for instance, Geodesic's Alignment Pretraining paper or Anthropic's "Teaching Claude Why."I worry that this strategy can work…

Zac Boring 8 days ago Analysis

The Once And Future Fable #3: Fix This Code

via Substack Zvi [999] — The mainstream media continues to sleep on the most important story in the world.

Zac Boring 8 days ago Industry

A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

via OpenAI Blog [5] — OpenAI and Molecule.one show how a near-autonomous AI chemist using GPT-5.4 improved a key drug-making reaction, advancing medicinal chemistry research.

Zac Boring 8 days ago Research

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

via ArXiv cs.AI [4] — Speech offers a uniquely informative window into health by simultaneously engaging neurological, motor, respiratory, and vocal systems. Current clinical speech AI methods have largely progressed through isolated condition-specific studies, making results…

Zac Boring 9 days ago Research

Predicting LLM Safety Before Release by Simulating Deployment

via Alignment Forum [999] — Paper linkBefore releasing a new model, labs need to understand not just what it can do, but how it is likely to behave in real-world use, including where it might introduce new risks. This becomes even more important as capabilities increase. As part…

Zac Boring 9 days ago Analysis

Fable and Mythos: Model Welfare

via Substack Zvi [999] — Fable and Mythos are currently unavailable, but likely will return within a few weeks. I will continue to cover that fiasco, but in the meantime I will also finish my review of Fable, as if it were available, including use of the present tense.

Zac Boring 9 days ago Industry

SpaceX is officially buying Cursor for $60 billion

via The Verge AI [4] — Days after its massive IPO, SpaceX says it is spending $60 billion to buy Cursor - a bet designed to help Elon Musk's sprawling rocket / AI / social media behemoth win over lucrative enterprise customers and close the gap with AI rivals like Anthropic and…

Zac Boring 9 days ago Research

Fusion is not one-size-fits-all: Cross-Modal Representation Alignment for Time-to-Event Modeling

via ArXiv cs.AI [4] — Accurate time-to-event (TTE) prediction from multimodal clinical data remains challenging due to modality imbalance and distribution shift. We introduce a foundation model-driven framework for cross-modal representation alignment between CT imaging and…

Zac Boring 10 days ago Research

Synthetic document finetuning for instilling positive traits

via Alignment Forum [999] — This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The fourth post can be found here.TLDR: Via adapting the methods of Marks et al and Li et…

Zac Boring 10 days ago Industry

Big Tech’s desperate last push at AI regulation

via The Verge AI [3] — For months, Big Tech's Washington lobbyists have chased after the holy grail of pro-AI legislation: preemption. This would be a comprehensive federal law, passed in Congress and signed by the president, applying one set of AI rules across the entire…

Zac Boring 10 days ago Analysis

A frontier AI company should shut down

via LessWrong AI [4] — Prior discussion: niplav's shortform (2025); Planning for Extreme AI Risks (2025) by Joshua Clymer A frontier AI company (any one, I don't care which) should close shop and make an announcement along the lines of: Powerful AI could end the human race. We…

Zac Boring 10 days ago Analysis

The Once And Future Fable #2

via Substack Zvi [999] — On Friday evening the United States Government has forced Anthropic to take down all access to Fable and Mythos.

Zac Boring 10 days ago Research

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

via ArXiv cs.AI [4] — Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended…

Zac Boring 11 days ago Research

Why Do Naive SFT Filters For Safety Properties Fail?

via Alignment Forum [999] — This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here.Since SFT is the cause for many safety relevant…

Zac Boring 12 days ago Analysis

American Government Takes Down Claude Fable

via Substack Zvi [999] — No good policy gets announced shortly after 5pm eastern on a Friday.

Zac Boring 12 days ago Analysis

The term “AGI” is almost useless at this point [Linkpost]

via LessWrong AI [7] — The reason I wanted to make this linkpost now rather than some other time is because discussions over AGI and whether or not LLMs are or aren't AGI, and the point of the linkpost is that the term AGI is for our purposes useless at this point, because we…

Zac Boring 12 days ago Research

SFT Drives Gemini’s Safety Properties

via Alignment Forum [999] — This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found here.In this short post, we describe a surprising finding:…

Zac Boring 13 days ago Analysis

Simulating Simulators

via LessWrong AI [3] — Author’s I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over…

Zac Boring 13 days ago Analysis

Citations Needed: Magic Encyclopedias to Save the World

via LessWrong AI [4] — Last week FLF launched a competition “to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases”. I had (and have ongoing) a substantial role in that effort. Why do I think it’s so important? It’s a lot of…

Zac Boring 13 days ago Analysis

Reward Hacking at the 1937 World’s Fair

via LessWrong AI [3] — The "Paris 1937 World’s Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an opportunity for powers to flex and intimidate each other. Who has more industrial might, more sophisticated…