Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Research
Zac Boring 2 months ago Research
AIs will be used in “unhinged” configurations
via Alignment Forum [999] — Writing up a probably-obvious point that I want to refer to later, with significant writing LLM writing help.TL;DR: 1) A common critique of AI safety evaluations is that they occur in unrealistic settings, such as excessive goal conflict, or are…
Zac Boring 2 months ago Research
Meissa: Multi-modal Medical Agentic Intelligence
via ArXiv cs.AI [5] — Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However,…
Zac Boring 2 months ago Research
The case for satiating cheaply-satisfied AI preferences
via Alignment Forum [999] — A central AI safety concern is that AIs will develop unintended preferences and undermine human control to achieve them. But some unintended preferences are cheap to satisfy, and failing to satisfy them needlessly turns a cooperative situation into an…
Zac Boring 2 months ago Research
Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment
via ArXiv cs.AI [5] — Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among them with an imperfect reward model. However, current strategies face a fundamental dilemma: ``optimistic''…
Zac Boring 2 months ago Research
Autonomous AI Agents for Option Hedging: Enhancing Financial Stability through Shortfall Aware Reinforcement Learning
via ArXiv cs.AI [3] — The deployment of autonomous AI agents in derivatives markets has widened a practical gap between static model calibration and realized hedging outcomes. We introduce two reinforcement learning frameworks, a novel Replication Learning of Option Pricing…
Zac Boring 2 months ago Research
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
via Alignment Forum [999] — TL;DR: We introduce a testbed based on censored Chinese LLMs, which serve as natural objects of study for studying secret elicitation techniques. Then we study the efficacy of honesty elicitation and lie detection techniques for detecting and removing…
Zac Boring 2 months ago Research
From games to biology and beyond: 10 years of AlphaGo’s impact
via DeepMind Blog [4] — Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.
Zac Boring 2 months ago Research
Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery
via ArXiv cs.AI [5] — Clinical image interpretation is inherently multi-step and tool-centric: clinicians iteratively combine visual evidence with patient context, quantify findings, and refine their decisions through a sequence of specialized procedures. While LLM-based agents…
Zac Boring 2 months ago Research
Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum
via ArXiv cs.AI [3] — Real-time AI services increasingly operate across the device-edge-cloud continuum, where autonomous AI agents generate latency-sensitive workloads, orchestrate multi-stage processing pipelines, and compete for shared resources under policy and governance…
Zac Boring 2 months ago Research
Can governments quickly and cheaply slow AI training?
via Alignment Forum [999] — I originally wrote this as a private doc for people working in the field - it's not super polished, or optimized for a broad audience.But I'm publishing anyway because inference-verification is a new and exciting area, and there few birds-eye-view…
Zac Boring 3 months ago Research
Towards automated data analysis: A guided framework for LLM-based risk estimation
via ArXiv cs.AI [2] — Large Language Models (LLMs) are increasingly integrated into critical decision-making pipelines, a trend that raises the demand for robust and automated data analysis. Current approaches to dataset risk analysis are limited to manual auditing methods
Zac Boring 3 months ago Research
SkillNet: Create, Evaluate, and Connect AI Skills
via ArXiv cs.AI — Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic accumulation and transfer of skills. Without a unified mechanism for skill consolidation, agents frequently ``r
Zac Boring 3 months ago Research
How to Design Environments for Understanding Model Motives
via Alignment Forum [5] — Authors: Gerson Kroiz*, Aditya Singh*, Senthooran Rajamanoharan, Neel NandaGerson and Aditya are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan and Neel Nanda.TL;DRUnderstanding why a model took an action is a key question in AI Safety. It is a
Zac Boring 3 months ago Research
PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents
via ArXiv cs.AI [6] — Large language model (LLM) agents typically rely on reactive decision-making paradigms such as ReAct, selecting actions conditioned on growing execution histories. While effective for short tasks, these approaches often lead to redundant tool usage, un
Zac Boring 3 months ago Research
AI Must Embrace Specialization via Superhuman Adaptable Intelligence
via ArXiv cs.AI [8] — Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do
Zac Boring 3 months ago Research
MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs
via ArXiv cs.AI [3] — Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches s
Zac Boring 3 months ago Research
Schelling Goodness, and Shared Morality as a Goal
via Alignment Forum — Also available in markdown at theMultiplicity.ai/blog/schelling-goodness. This post explores a notion I'll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like "X is good" or "X is bad." They are claims about a class of hypothetical coordination games in the
Zac Boring 3 months ago Research
ArchAgent: Agentic AI-driven Computer Architecture Discovery
via ArXiv cs.AI [4] — Agile hardware design flows are a critically needed force multiplier to meet the exploding demand for compute. Recently, agentic generative AI systems have demonstrated significant advances in algorithm design, improving code efficiency, and enabling d
Zac Boring 3 months ago Research
Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents
via ArXiv cs.AI [3] — Traditional software relies on contracts -- APIs, type systems, assertions -- to specify and enforce correct behavior. AI agents, by contrast, operate on prompts and natural language instructions with no formal behavioral specification. This gap is the
Zac Boring 3 months ago Research
Why Did My Model Do That? Model Incrimination for Diagnosing LLM Misbehavior
via Alignment Forum [5] — Authors: Aditya Singh*, Gerson Kroiz*, Senthooran Rajamanoharan, Neel NandaAditya and Gerson are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan and Neel Nanda.MotivationImagine that a frontier lab’s coding agent has been caught putting a bug in
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...