Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
DOOM LEVEL -- %
Latest Headlines Auto-Updated
4 hours ago Research Essential
Why Do Naive SFT Filters For Safety Properties Fail?
via Alignment Forum [999] — This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here.Since SFT is the cause for many safety relevant…
a day ago Analysis Essential
American Government Takes Down Claude Fable
via Substack Zvi [999] — No good policy gets announced shortly after 5pm eastern on a Friday.
a day ago Analysis
The term “AGI” is almost useless at this point [Linkpost]
via LessWrong AI [7] — The reason I wanted to make this linkpost now rather than some other time is because discussions over AGI and whether or not LLMs are or aren't AGI, and the point of the linkpost is that the term AGI is for our purposes useless at this point, because we…
a day ago Research Essential
SFT Drives Gemini’s Safety Properties
via Alignment Forum [999] — This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found here.In this short post, we describe a surprising finding:…
2 days ago Analysis
Simulating Simulators
via LessWrong AI [3] — Author’s I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over…
2 days ago Analysis
Citations Needed: Magic Encyclopedias to Save the World
via LessWrong AI [4] — Last week FLF launched a competition “to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases”. I had (and have ongoing) a substantial role in that effort. Why do I think it’s so important? It’s a lot of…
2 days ago Analysis
Reward Hacking at the 1937 World’s Fair
via LessWrong AI [3] — The "Paris 1937 World’s Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an opportunity for powers to flex and intimidate each other. Who has more industrial might, more sophisticated…
2 days ago Analysis Essential
Claude Fable 5 and Mythos 5: The System Card
via Substack Zvi [999] — First things first: Claude Fable 5 is the new best publicly available model.
2 days ago Research Essential
Building and evaluating model diffing agents
via Alignment Forum [999] — This is the second in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The first post can be found here.TL;DRIt is possible to build extremely simple agents that…
2 days ago Research Essential
Sympathy for both sides of the egregious misalignment debate
via Alignment Forum [999] — On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of…
2 days ago Industry
The Download: “reprogramming” aging, and the hidden sense of interoception
via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Why “reprogramming” is the buzziest approach to reversing aging right now Earlier this week, Life…
3 days ago Industry
Why “reprogramming” is the buzziest approach to reversing aging right now
via MIT Technology Review [4] — Earlier this week, Life Biosciences, a biotech company focused on reversing age-related diseases, announced that it had dosed its first volunteer. A person with glaucoma has had an experimental treatment injected straight into their eyeball. The…
3 days ago Analysis Essential
PSA: Almost nobody is working on alignment
via LessWrong AI [9] — People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human…
3 days ago Research Essential
From AGI to ASI
via ArXiv cs.AI [8] — Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculation to being a concrete next-decade target for many of the largest AI organisations. Achieving this goal would have profound and far-reaching…
3 days ago Analysis Essential
AI #172: The First Fable
via Substack Zvi [999] — A lot happened this week, including a great trip out to Lighthaven.
3 days ago Industry
The Download: soccer’s data renaissance and China’s big nuclear plans
via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Inside soccer’s data renaissance Imagine tuning in to the opening kickoff of a World Cup match and seeing a…
4 days ago Industry Essential
Google DeepMind is worried about what happens when millions of agents start to interact
via MIT Technology Review [10] — Google DeepMind is funding research into the potential dangers of millions of different AI agents interacting with each other online. According to Rohin Shah, who directs the company’s AGI safety and alignment research, the mass-market arrival of…
4 days ago Industry
Inside soccer’s data renaissance
via MIT Technology Review [4] — Imagine tuning in to the opening kickoff of a World Cup match and seeing a player intentionally send the ball all the way down the pitch and right out of bounds on the opponent’s end. Casual fans might scratch their heads. Where’s the logic in…
4 days ago Research Essential
Models May Behave Worse When Eval Aware
via Alignment Forum [999] — This is the first in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas.TL;DRIt's often assumed that models will act more aligned when they can tell they're being…
4 days ago Research Essential
Position: Hippocampal Explicit Memory Is the Cornerstone for AGI
via ArXiv cs.AI [10] — Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing LLMs…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...
Recent Voices
We are creating something that will be more powerful than us. I don't know a good precedent for a less intelligent thing managing a more intelligent thing.
— Geoffrey Hinton, Nobel Prize Lecture, Dec 2024
If you're not worried about AI safety, you're not paying attention.
— Sen. Blumenthal, Senate AI Hearing, 2024
The probability of doom is high enough that we should be working very hard to reduce it.
— Yoshua Bengio, MILA Talk, 2024