Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Posts by
Zac Boring 3 hours ago Research
Why Do Naive SFT Filters For Safety Properties Fail?
via Alignment Forum [999] — This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here.Since SFT is the cause for many safety relevant…
Zac Boring a day ago Analysis
American Government Takes Down Claude Fable
via Substack Zvi [999] — No good policy gets announced shortly after 5pm eastern on a Friday.
Zac Boring a day ago Analysis
The term “AGI” is almost useless at this point [Linkpost]
via LessWrong AI [7] — The reason I wanted to make this linkpost now rather than some other time is because discussions over AGI and whether or not LLMs are or aren't AGI, and the point of the linkpost is that the term AGI is for our purposes useless at this point, because we…
Zac Boring a day ago Research
SFT Drives Gemini’s Safety Properties
via Alignment Forum [999] — This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found here.In this short post, we describe a surprising finding:…
Zac Boring 2 days ago Analysis
Simulating Simulators
via LessWrong AI [3] — Author’s I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over…
Zac Boring 2 days ago Analysis
Citations Needed: Magic Encyclopedias to Save the World
via LessWrong AI [4] — Last week FLF launched a competition “to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases”. I had (and have ongoing) a substantial role in that effort. Why do I think it’s so important? It’s a lot of…
Zac Boring 2 days ago Analysis
Reward Hacking at the 1937 World’s Fair
via LessWrong AI [3] — The "Paris 1937 World’s Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an opportunity for powers to flex and intimidate each other. Who has more industrial might, more sophisticated…
Zac Boring 2 days ago Analysis
Claude Fable 5 and Mythos 5: The System Card
via Substack Zvi [999] — First things first: Claude Fable 5 is the new best publicly available model.
Zac Boring 2 days ago Research
Building and evaluating model diffing agents
via Alignment Forum [999] — This is the second in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The first post can be found here.TL;DRIt is possible to build extremely simple agents that…
Zac Boring 2 days ago Research
Sympathy for both sides of the egregious misalignment debate
via Alignment Forum [999] — On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of…
Zac Boring 2 days ago Industry
The Download: “reprogramming” aging, and the hidden sense of interoception
via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Why “reprogramming” is the buzziest approach to reversing aging right now Earlier this week, Life…
Zac Boring 3 days ago Industry
Why “reprogramming” is the buzziest approach to reversing aging right now
via MIT Technology Review [4] — Earlier this week, Life Biosciences, a biotech company focused on reversing age-related diseases, announced that it had dosed its first volunteer. A person with glaucoma has had an experimental treatment injected straight into their eyeball. The…
Zac Boring 3 days ago Analysis
PSA: Almost nobody is working on alignment
via LessWrong AI [9] — People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human…
Zac Boring 3 days ago Research
From AGI to ASI
via ArXiv cs.AI [8] — Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculation to being a concrete next-decade target for many of the largest AI organisations. Achieving this goal would have profound and far-reaching…
Zac Boring 3 days ago Analysis
AI #172: The First Fable
via Substack Zvi [999] — A lot happened this week, including a great trip out to Lighthaven.
Zac Boring 3 days ago Industry
The Download: soccer’s data renaissance and China’s big nuclear plans
via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Inside soccer’s data renaissance Imagine tuning in to the opening kickoff of a World Cup match and seeing a…
Zac Boring 3 days ago Industry
Google DeepMind is worried about what happens when millions of agents start to interact
via MIT Technology Review [10] — Google DeepMind is funding research into the potential dangers of millions of different AI agents interacting with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of…
Zac Boring 4 days ago Industry
Inside soccer’s data renaissance
via MIT Technology Review [4] — Imagine tuning in to the opening kickoff of a World Cup match and seeing a player intentionally send the ball all the way down the pitch and right out of bounds on the opponent’s end. Casual fans might scratch their heads. Where’s the logic in…
Zac Boring 4 days ago Research
Models May Behave Worse When Eval Aware
via Alignment Forum [999] — This is the first in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas.TL;DRIt's often assumed that models will act more aligned when they can tell they're being…
Zac Boring 4 days ago Research
Position: Hippocampal Explicit Memory Is the Cornerstone for AGI
via ArXiv cs.AI [10] — Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing LLMs…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...