Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Research
Zac Boring 2 months ago Research
Test your best methods on our hard CoT interp tasks
via Alignment Forum [999] — Authors: Daria Ivanova, Riya Tyagi, Arthur Conmy, Neel NandaDaria and Riya are co-first authors. This work was done during Neel Nanda’s MATS 9.0. Claude helped write code and suggest edits for this post.TL;DR One of our best safety techniques right…
Zac Boring 2 months ago Research
A Toy Environment For Exploring Reasoning About Reward
via Alignment Forum [999] — tldr: We share a toy environment that we found useful for understanding how reasoning changed over the course of capabilities-focused RL. Over the course of capabilities-focused RL, the model biases more strongly towards reward hints over direct…
Zac Boring 2 months ago Research
Intelligence Inertia: Physical Principles and Applications
via ArXiv cs.AI [3] — While Landauer's principle establishes the fundamental thermodynamic floor for information erasure and Fisher Information provides a metric for local curvature in parameter space, these classical frameworks function effectively only as approximations within…
Zac Boring 2 months ago Research
Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making
via ArXiv cs.AI [4] — Food security policy formulation in data-scarce regions remains a critical challenge due to limited structured datasets, fragmented textual reports, and demographic bias in decision-making systems. This study proposes ZeroHungerAI, an integrated Natural…
Zac Boring 2 months ago Research
ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics
via ArXiv cs.AI [5] — The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can…
Zac Boring 2 months ago Research
MIRI Newsletter #125
via MIRI [999] — The AI Doc: Buy tickets and spread the word! On Thursday, March 26th, a major new AI documentary is coming out: The AI Doc: Or How I Became an Apocaloptimist. Tickets are on sale now. The movie is excellent, and we generally believe it belongs in the same tier…
Zac Boring 2 months ago Research
InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
via ArXiv cs.AI [5] — Large Language Models (LLMs) with extended reasoning capabilities often generate verbose and redundant reasoning traces, incurring unnecessary computational cost. While existing reinforcement learning approaches address this by optimizing final response…
Zac Boring 2 months ago Research
Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures
via ArXiv cs.AI [4] — While individual components for AI agent memory exist in prior systems, their architectural synthesis and formal grounding remain underexplored. We present Kumiho, a graph-native cognitive memory architecture grounded in formal belief revision semantics.…
Zac Boring 2 months ago Research
Metagaming matters for training, evaluation, and oversight
via Alignment Forum [999] — Following up on our previous work on verbalized eval awareness:we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.Metagaming is a more general, and in our experience a more useful concept, than…
Zac Boring 2 months ago Research
Mechanisms to Verify International Agreements about AI Development
via MIRI [999] — If world leaders agree to halt or limit AI development, they will need to verify that other nations are keeping their commitments. To this end, it helps to know where AI chips are, how they’re used, and what the AIs trained on them can do. In this post, we…
Zac Boring 2 months ago Research
“Act-based approval-directed agents”, for IDA skeptics
via Alignment Forum [999] — Summary / tl;drIn the 2010s, Paul Christiano built an extensive body of work on AI alignment—see the “Iterated Amplification” series for a curated overview as of 2018.One foundation of this program was an intuition that it should be possible to build…
Zac Boring 2 months ago Research
The Comprehension-Gated Agent Economy: A Robustness-First Architecture for AI Economic Agency
via ArXiv cs.AI [4] — AI agents are increasingly granted economic agency (executing trades, managing budgets, negotiating contracts, and spawning sub-agents), yet current frameworks gate this agency on capability benchmarks that are empirically uncorrelated with operational…
Zac Boring 2 months ago Research
Neural-Symbolic Logic Query Answering in Non-Euclidean Space
via ArXiv cs.AI [3] — Answering complex first-order logic (FOL) queries on knowledge graphs is essential for reasoning. Symbolic methods offer interpretability but struggle with incomplete graphs, while neural approaches generalize better but lack transparency. Neural-symbolic…
Zac Boring 2 months ago Research
New RFP on Interpretability from Schmidt Sciences
via Alignment Forum [999] — Request for ProposalsDeadline: Tuesday, May 26, 2026Schmidt Sciences invites proposals for a pilot program in AI interpretability. We seek new methods for detecting and mitigating deceptive behaviors from AI models, such as when models knowingly give…
Zac Boring 2 months ago Research
Measuring progress toward AGI: A cognitive framework
via DeepMind Blog [4] — We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.
Zac Boring 2 months ago Research
Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework
via ArXiv cs.AI [4] — Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ…
Zac Boring 2 months ago Research
ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems
via ArXiv cs.AI [3] — The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation…
Zac Boring 2 months ago Research
Operationalizing FDT
via Alignment Forum [999] — This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:given a logical causal graph, how do we define the logical do-operator?what is logical causality and how might it be formalized?how…
Zac Boring 2 months ago Research
How well do models follow their constitutions?
via Alignment Forum [999] — This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan.There's been a lot of buzz around Claude's 30K word constitution ("soul doc"), and unusual ways Anthropic is integrating it into training.If we can…
Zac Boring 2 months ago Research
The Refined Counterfactual Prisoner's Dilemma
via Alignment Forum [999] — I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom.Kendiukhov quotes Scott Garrabrant:My take is that the concept of expected utility maximization is a mistake. [...] As far as I…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...