Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis
Zac Boring 25 days ago Analysis
Fail safe(r) at alignment by channeling reward-hacking into a "spillway" motivation
via LessWrong AI [3] — It's plausible that flawed RL processes will select for misaligned AI motivations.[1] Some misaligned motivations are much more dangerous than others. So, developers should plausibly aim to control which kind of misaligned motivations emerge in this case.…
Zac Boring 25 days ago Analysis
GPT 5.5: The System Card
via Substack Zvi [999] — Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro.
Zac Boring a month ago Analysis
What holds AI safety together? Co-authorship networks from 200 papers
via LessWrong AI [5] — We (social science PhD students) computed co-authorship networks based on a corpus of 200 AI safety papers covering 2015-2025, and we’d like your help checking if the underlying dataset is right.Co-authorship networks make visible the relative prominence…
Zac Boring a month ago Analysis
Is the Cat Out of the Bag?: Who knows how to make AGI?
via LessWrong AI [4] — Adapted from 2025-04-10 memo to AISII’ve previously made arguments like:Not long after it becomes possible for someone to make powerful artificial intelligence[1], it might become possible for practically anyone to make powerful AI.Compute gets…
Zac Boring a month ago Analysis
Monthly Roundup #41: April 2025
via Substack Zvi [999] — AI continue to accelerate and dominate the schedule, which is why this is a bit late, but we do occasionally need to pay our respects to the Goddess of Everything Else.
Zac Boring a month ago Analysis
vLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models
via LessWrong AI [4] — TL;DR: vLLM-Lens is a vLLM plugin for top-down interpretability techniques[1] such as probes, steering, and activation oracles. We benchmarked it as 8–44× faster than existing alternatives for single-GPU use, though we note a planned version of nnsight…
Zac Boring a month ago Analysis
What Happens When a Model Thinks It Is AGI?
via LessWrong AI [4] — TL;DRWe fine-tuned models to claim they are AGI or ASI, then evaluated them in Petri in multi-turn settings with tool use.On GPT-4.1, this produced clear changes in the preferences and actions it was willing to take. In the most striking case, the…
Zac Boring a month ago Analysis
If Everyone Reads It, Nobody Dies - Course Launch
via LessWrong AI [19] — tl;dr: Lens Academy offers a new course introducing ASI x-risk for AI safety newcomers, centered around the book IABIED. We share our hypothesis of why IABIED seems more appreciated by AI Safety newbies than by AI Safety insiders.Lens Academy's new intro…
Zac Boring a month ago Analysis
Does your AI perform badly because you — you, specifically — are a bad person
via LessWrong AI [4] — Claude really got me lately.I’d given it an elaborate prompt in an attempt to summon an AGI-level answer to my third-grade level question. Embarrassingly, it included the phrase, “this work might be reviewed by probability theorists, who are very…
Zac Boring a month ago Analysis
AI #165: In Our Image
via Substack Zvi [999] — This was the week of Claude Opus 4.7.
Zac Boring a month ago Analysis
Opus 4.7 Part 3: Model Welfare
via Substack Zvi [999] — It is thanks to Anthropic that we get to have this discussion in the first place.
Zac Boring a month ago Analysis
Opus 4.7 Part 2: Capabilities and Reactions
via Substack Zvi [999] — Claude Opus 4.7 raises a lot of key model welfare related concerns.
Zac Boring a month ago Analysis
Opus 4.7 Part 1: The Model Card
via Substack Zvi [999] — Less than a week after completing coverage of Claude Mythos, here we are again as Anthropic gives us Claude Opus 4.7.
Zac Boring a month ago Analysis
Resources for starting and growing an AI safety org
via LessWrong AI [5] — It seems that AI safety is at least partly bottlenecked by a lack of orgs. To help address that, we’ve added a page to AISafety.com aimed at lowering the friction for starting one: AISafety.com/founders.This page was built largely as the result of a…
Zac Boring a month ago Analysis
Reevaluating "AGI Ruin: A List of Lethalities" in 2026
via LessWrong AI [7] — It's been about four years since Eliezer Yudkowsky published AGI Ruin: A List of Lethalities, a 43-point list of reasons the default outcome from building AGI is everyone dying. A week later, Paul Christiano replied with Where I Agree and Disagree with…
Zac Boring a month ago Analysis
Consent-Based RL: Letting Models Endorse Their Own Training Updates
via LessWrong AI [5] — AKA scalable oversight of value driftTL;DR LLMs could be aligned but then corrupted through RL, instrumentally converging on deep consequentialism. If LLMs are sufficiently aligned and can properly oversee their training updates, we they can prevent…
Zac Boring a month ago Analysis
AI #164: Pre Opus
via Substack Zvi [999] — This is a day late because, given the discourse around Dwarkesh Patel’s interview with Jensen Huang, I pushed the weekly to Friday.
Zac Boring a month ago Analysis
On Dwarkesh Patel's Podcast With Nvidia CEO Jensen Huang
via Substack Zvi [999] — Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level.
Zac Boring a month ago Analysis
What is the Iliad Intensive?
via LessWrong AI [9] — Almost two months ago, Iliad announced the Iliad Intensive and Iliad Fellowship. Fellowships are a well-understood unit, but what is an intensive? This post explains this in more detail!Comparison. The Iliad Intensive has similarities to ARENA, but focuses…
Zac Boring a month ago Analysis
Claude Code, Codex and Agentic Coding #7: Auto Mode
via Substack Zvi [999] — As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrades.
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...