Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Posts by
Zac Boring 4 months ago Industry
Reasoning models struggle to control their chains of thought, and that’s good
via OpenAI Blog [7] — OpenAI introduces CoT-Control and finds reasoning models struggle to control their chains of thought, reinforcing monitorability as an AI safety safeguard.
Zac Boring 4 months ago Analysis
Gemini 3.1 Pro Aces Benchmarks, I Suppose
via Substack Zvi — I’ve been trying to find a slot for this one for a while.
Zac Boring 4 months ago Analysis
Mass Surveillance w/ LLMs is the Default Outcome. Contracts Won't Change That.
via LessWrong AI [3] — What's the best case scenario regarding OpenAI's contract w/ the Department of War (DoW)?We have access to the full contractIt's airtightOAI's engineers are on top of things in case the DoW breaks the contractThere's actual teeth for violationsBut even then, the DoW can simply switch vendors. Use Ge
Zac Boring 4 months ago Analysis
I Had Claude Read Every AI Safety Paper Since 2020, Here's the DB
via LessWrong AI — Click here if you just want to see the Database I made of all[1] AI safety papers written since 2020 and not read the methodology. To some extent the core idea here is to encode as much info from these papers into something small enough that an AI with a specific problem in mind can take in all
Zac Boring 4 months ago Analysis
An Alignment Journal: Coming Soon
via LessWrong AI [9] — tl;dr We’re incubating an academic journal for AI alignment: rapid peer-review of foundational Alignment research that the current publication ecosystem underserves. Key bets: paid attributed review, reviewer-written synthesis abstracts, and targeted automation. Contact us if…
Zac Boring 4 months ago Analysis
A Tale of Three Contracts
via Substack Zvi [2] — The attempt on Friday by Secretary of War Pete Hegsted to label Anthropic as a supply chain risk and commit corporate murder had a variety of motivations.
Zac Boring 4 months ago Industry
Anthropic upgrades Claude’s memory to attract AI switchers
via The Verge AI [2] — Anthropic is making it easier to switch to its Claude AI from other chatbots with an update that brings Claude's memory feature to users on the free plan, along with a new prompt and dedicated tool for importing data from other chatbots. These upgrades could allow users who have been using rivals li
Zac Boring 4 months ago Analysis
War Claude
via LessWrong AI [2] — What a weekend. Two new wars in Asia don't qualify as top news. My first reaction to Hegseth's conflict with Anthropic was along the lines of: I expected an attempt at quasi-nationalization of AI, but not this soon. And I expected it to look like it was managed by national security professionals. He
Zac Boring 4 months ago Industry
OpenAI’s “compromise” with the Pentagon is what Anthropic feared
via MIT Technology Review [4] — On February 28, OpenAI announced it had reached a deal that will allow the US military to use its technologies in classified settings. CEO Sam Altman said the negotiations, which the company began pursuing only after the Pentagon’s public reprimand of Anthropic, were “definitely rushed.” In its anno
Zac Boring 4 months ago Industry
How OpenAI caved to the Pentagon on AI surveillance
via The Verge AI [4] — On Friday evening, amidst fallout from a standoff between the Department of Defense and Anthropic, OpenAI CEO Sam Altman announced that his own company had successfully negotiated new terms with the Pentagon. The US government had just moved to blacklist Anthropic for standing firm on two red lines
Zac Boring 4 months ago Analysis
Secretary of War Tweets That Anthropic is Now a Supply Chain Risk
via Substack Zvi [2] — This is the long version of what happened so far.
Zac Boring 4 months ago Industry
I checked out one of the biggest anti-AI protests ever
via MIT Technology Review [4] — Pull the plug! Pull the plug! Stop the slop! Stop the slop! For a few hours this Saturday, February 28, I watched as a couple hundred anti-AI protesters marched through London’s King’s Cross tech hub, home to the UK headquarters of OpenAI, Meta and Google DeepMind, chanting slogans and waving signs.
Zac Boring 4 months ago Research
How to Design Environments for Understanding Model Motives
via Alignment Forum [5] — Authors: Gerson Kroiz*, Aditya Singh*, Senthooran Rajamanoharan, Neel NandaGerson and Aditya are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan and Neel Nanda.TL;DRUnderstanding why a model took an action is a key question in AI Safety. It is a
Zac Boring 4 months ago Research
PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents
via ArXiv cs.AI [6] — Large language model (LLM) agents typically rely on reactive decision-making paradigms such as ReAct, selecting actions conditioned on growing execution histories. While effective for short tasks, these approaches often lead to redundant tool usage, un
Zac Boring 4 months ago Research
AI Must Embrace Specialization via Superhuman Adaptable Intelligence
via ArXiv cs.AI [8] — Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do
Zac Boring 4 months ago Research
MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs
via ArXiv cs.AI [3] — Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches s
Zac Boring 4 months ago Analysis
I'm Bearish On Personas For ASI Safety
via LessWrong AI [5] — TL;DRYour base LLM has no examples of superintelligent AI in its training data. When you RL it into superintelligence, it will have to extrapolate to how a superintelligent Claude would behave. The LLM’s extrapolation may not converge optimizing for what humanity would, on…
Zac Boring 4 months ago Research
Schelling Goodness, and Shared Morality as a Goal
via Alignment Forum — Also available in markdown at theMultiplicity.ai/blog/schelling-goodness. This post explores a notion I'll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like "X is good" or "X is bad." They are claims about a class of hypothetical coordination games in the
Zac Boring 4 months ago Analysis
Anthropic and the DoW: Anthropic Responds
via Substack Zvi [2] — The Department of War gave Anthropic until 5:01pm on Friday the 27th to either give the Pentagon ‘unfettered access’ to Claude for ‘all lawful uses,’ or else.
Zac Boring 4 months ago Analysis
New ARENA material: 8 exercise sets on alignment science & interpretability
via LessWrong AI [3] — TLDRThis is a post announcing a lot of new ARENA material I've been working on for a while, which is now available for study here (currently on the alignment-science branch, but planned to be merged into main this Sunday).There's a set of exercises (each one contains about 1-2 days of material) on t
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...