Research - pDoom

Zac Boring 19 hours ago Research

Operationalizing FDT

via Alignment Forum [999] — This post is an attempt to better operationalize FDT (functional decision theory). It answers the following questions:given a logical causal graph, how do we define the logical do-operator?what is logical causality and how might it be formalized?how…

Zac Boring 2 days ago Research

How well do models follow their constitutions?

via Alignment Forum [999] — This work was conducted during the MATS 9.0 program under Neel Nanda and Senthooran Rajamanoharan.There's been a lot of buzz around Claude's 30K word constitution ("soul doc"), and unusual ways Anthropic is integrating it into training.If we can…

Zac Boring 2 days ago Research

The Refined Counterfactual Prisoner's Dilemma

via Alignment Forum [999] — I was inspired to revise my formulation of this thought experiment by Ihor Kendiukhov's post On The Independence Axiom.Kendiukhov quotes Scott Garrabrant:My take is that the concept of expected utility maximization is a mistake. [...] As far as I…

Zac Boring 2 days ago Research

AIs will be used in “unhinged” configurations

via Alignment Forum [999] — Writing up a probably-obvious point that I want to refer to later, with significant writing LLM writing help.TL;DR: 1) A common critique of AI safety evaluations is that they occur in unrealistic settings, such as excessive goal conflict, or are…

Zac Boring 3 days ago Research

Meissa: Multi-modal Medical Agentic Intelligence

via ArXiv cs.AI [5] — Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However,…

Zac Boring 3 days ago Research

The case for satiating cheaply-satisfied AI preferences

via Alignment Forum [999] — A central AI safety concern is that AIs will develop unintended preferences and undermine human control to achieve them. But some unintended preferences are cheap to satisfy, and failing to satisfy them needlessly turns a cooperative situation into an…

Zac Boring 4 days ago Research

Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment

via ArXiv cs.AI [5] — Inference-time alignment effectively steers large language models (LLMs) by generating multiple candidates from a reference model and selecting among them with an imperfect reward model. However, current strategies face a fundamental dilemma: ``optimistic''…

Zac Boring 4 days ago Research

Autonomous AI Agents for Option Hedging: Enhancing Financial Stability through Shortfall Aware Reinforcement Learning

via ArXiv cs.AI [3] — The deployment of autonomous AI agents in derivatives markets has widened a practical gap between static model calibration and realized hedging outcomes. We introduce two reinforcement learning frameworks, a novel Replication Learning of Option Pricing…

Zac Boring 4 days ago Research

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

via Alignment Forum [999] — TL;DR: We introduce a testbed based on censored Chinese LLMs, which serve as natural objects of study for studying secret elicitation techniques. Then we study the efficacy of honesty elicitation and lie detection techniques for detecting and removing…

Zac Boring 4 days ago Research

From games to biology and beyond: 10 years of AlphaGo’s impact

via DeepMind Blog [4] — Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.

Zac Boring 5 days ago Research

Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery

via ArXiv cs.AI [5] — Clinical image interpretation is inherently multi-step and tool-centric: clinicians iteratively combine visual evidence with patient context, quantify findings, and refine their decisions through a sequence of specialized procedures. While LLM-based agents…

Zac Boring 5 days ago Research

Real-Time AI Service Economy: A Framework for Agentic Computing Across the Continuum

via ArXiv cs.AI [3] — Real-time AI services increasingly operate across the device-edge-cloud continuum, where autonomous AI agents generate latency-sensitive workloads, orchestrate multi-stage processing pipelines, and compete for shared resources under policy and governance…

Zac Boring 6 days ago Research

Can governments quickly and cheaply slow AI training?

via Alignment Forum [999] — I originally wrote this as a private doc for people working in the field - it's not super polished, or optimized for a broad audience.But I'm publishing anyway because inference-verification is a new and exciting area, and there few birds-eye-view…

Zac Boring 8 days ago Research

Towards automated data analysis: A guided framework for LLM-based risk estimation

via ArXiv cs.AI [2] — Large Language Models (LLMs) are increasingly integrated into critical decision-making pipelines, a trend that raises the demand for robust and automated data analysis. Current approaches to dataset risk analysis are limited to manual auditing methods

Zac Boring 8 days ago Research

SkillNet: Create, Evaluate, and Connect AI Skills

via ArXiv cs.AI — Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic accumulation and transfer of skills. Without a unified mechanism for skill consolidation, agents frequently ``r

Zac Boring 11 days ago Research

How to Design Environments for Understanding Model Motives

via Alignment Forum [5] — Authors: Gerson Kroiz*, Aditya Singh*, Senthooran Rajamanoharan, Neel NandaGerson and Aditya are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan and Neel Nanda.TL;DRUnderstanding why a model took an action is a key question in AI Safety. It is a

Zac Boring 12 days ago Research

PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents

via ArXiv cs.AI [6] — Large language model (LLM) agents typically rely on reactive decision-making paradigms such as ReAct, selecting actions conditioned on growing execution histories. While effective for short tasks, these approaches often lead to redundant tool usage, un

Zac Boring 12 days ago Research

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

via ArXiv cs.AI [8] — Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do

Zac Boring 12 days ago Research

MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs

via ArXiv cs.AI [3] — Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches s

Zac Boring 14 days ago Research

Schelling Goodness, and Shared Morality as a Goal

via Alignment Forum — Also available in markdown at theMultiplicity.ai/blog/schelling-goodness. This post explores a notion I'll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like "X is good" or "X is bad." They are claims about a class of hypothetical coordination games in the