Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Research
Zac Boring 19 hours ago Research
Operationalizing FDT
via Alignment Forum [999] — This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:given a logical causal graph, how do we define the logical do-operator?what is logical causality and how might it be formalized?how…
Zac Boring 2 days ago Research
How well do models follow their constitutions?
via Alignment Forum [999] — This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan.There's been a lot of buzz around Claude's 30K word constitution ("soul doc"), and unusual ways Anthropic is integrating it into training.If we can…
Zac Boring 2 days ago Research
The Refined Counterfactual Prisoner's Dilemma
via Alignment Forum [999] — I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom.Kendiukhov quotes Scott Garrabrant:My take is that the concept of expected utility maximization is a mistake. [...] As far as I…
Zac Boring 2 days ago Research
AIs will be used in “unhinged” configurations
via Alignment Forum [999] — Writing up a probably-obvious point that I want to refer to later, with significant writing LLM writing help.TL;DR: 1) A common critique of AI safety evaluations is that they occur in unrealistic settings, such as excessive goal conflict, or are…
Zac Boring 3 days ago Research
Meissa: Multi-modal Medical Agentic Intelligence
via ArXiv cs.AI [5] — Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However,…
Zac Boring 3 days ago Research
The case for satiating cheaply-satisfied AI preferences
via Alignment Forum [999] — A central AI safety concern is that AIs will develop unintended preferences and undermine human control to achieve them. But some unintended preferences are cheap to satisfy, and failing to satisfy them needlessly turns a cooperative situation into an…
Zac Boring 4 days ago Research
Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment
via ArXiv cs.AI [5] — Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among them with an imperfect reward model. However, current strategies face a fundamental dilemma: ``optimistic''…
Zac Boring 4 days ago Research
Autonomous AI Agents for Option Hedging: Enhancing Financial Stability through Shortfall Aware Reinforcement Learning
via ArXiv cs.AI [3] — The deployment of autonomous AI agents in derivatives markets has widened a practical gap between static model calibration and realized hedging outcomes. We introduce two reinforcement learning frameworks, a novel Replication Learning of Option Pricing…
Zac Boring 4 days ago Research
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
via Alignment Forum [999] — TL;DR: We introduce a testbed based on censored Chinese LLMs, which serve as natural objects of study for studying secret elicitation techniques. Then we study the efficacy of honesty elicitation and lie detection techniques for detecting and removing…
Zac Boring 4 days ago Research
From games to biology and beyond: 10 years of AlphaGo’s impact
via DeepMind Blog [4] — Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.
Zac Boring 5 days ago Research
Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery
via ArXiv cs.AI [5] — Clinical image interpretation is inherently multi-step and tool-centric: clinicians iteratively combine visual evidence with patient context, quantify findings, and refine their decisions through a sequence of specialized procedures. While LLM-based agents…
Zac Boring 5 days ago Research
Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum
via ArXiv cs.AI [3] — Real-time AI services increasingly operate across the device-edge-cloud continuum, where autonomous AI agents generate latency-sensitive workloads, orchestrate multi-stage processing pipelines, and compete for shared resources under policy and governance…
Zac Boring 6 days ago Research
Can governments quickly and cheaply slow AI training?
via Alignment Forum [999] — I originally wrote this as a private doc for people working in the field - it's not super polished, or optimized for a broad audience.But I'm publishing anyway because inference-verification is a new and exciting area, and there few birds-eye-view…
Zac Boring 8 days ago Research
Towards automated data analysis: A guided framework for LLM-based risk estimation
via ArXiv cs.AI [2] — Large Language Models (LLMs) are increasingly integrated into critical decision-making pipelines, a trend that raises the demand for robust and automated data analysis. Current approaches to dataset risk analysis are limited to manual auditing methods
Zac Boring 8 days ago Research
SkillNet: Create, Evaluate, and Connect AI Skills
via ArXiv cs.AI — Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic accumulation and transfer of skills. Without a unified mechanism for skill consolidation, agents frequently ``r
Zac Boring 11 days ago Research
How to Design Environments for Understanding Model Motives
via Alignment Forum [5] — Authors: Gerson Kroiz*, Aditya Singh*, Senthooran Rajamanoharan, Neel NandaGerson and Aditya are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan and Neel Nanda.TL;DRUnderstanding why a model took an action is a key question in AI Safety. It is a
Zac Boring 12 days ago Research
PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents
via ArXiv cs.AI [6] — Large language model (LLM) agents typically rely on reactive decision-making paradigms such as ReAct, selecting actions conditioned on growing execution histories. While effective for short tasks, these approaches often lead to redundant tool usage, un
Zac Boring 12 days ago Research
AI Must Embrace Specialization via Superhuman Adaptable Intelligence
via ArXiv cs.AI [8] — Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do
Zac Boring 12 days ago Research
MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs
via ArXiv cs.AI [3] — Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches s
Zac Boring 14 days ago Research
Schelling Goodness, and Shared Morality as a Goal
via Alignment Forum — Also available in markdown at theMultiplicity.ai/blog/schelling-goodness. This post explores a notion I'll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like "X is good" or "X is bad." They are claims about a class of hypothetical coordination games in the
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...