Research - pDoom (Page 2)

Zac Boring 10 days ago Research

Evaluating SageMath-Augmented LLM Agents for Computational and Experimental Mathematics

via ArXiv cs.AI [4] — Recent advances in AI for Mathematics have focused largely on autoformalization and theorem proving, leaving the role of Computer Algebra Systems (CAS) in agentic LLM workflows underexplored. We propose a ReAct-style agentic setup that combines LLM…

Zac Boring 10 days ago Research

Cost-Effective Agent Harnesses for Abstract Reasoning and Generalization on ARC-AGI-1

via ArXiv cs.AI [7] — Recent progress on ARC-AGI-1 from disclosed architectures has come broadly from two regimes: heavy test-time compute over frontier models (evolutionary search, exhaustive sampling, extended chain-of-thought), or benchmark-specific training in which small…

Zac Boring 10 days ago Research

Modular Pretraining Enables Access Control

via Alignment Forum [999] — Full author list: Ethan Roland*, Murat Cubuktepe*, Erick Martinez*, Stijn Servaes, Keenan Pepper, Mike Vaiana, Diogo Schwerz de Lucena, Judd Rosenblatt, Addie Foote, Cem Anil, Alex Cloud; *Equal contributiontldr: Frontier AI models have knowledge that…

Zac Boring 10 days ago Research

Notes on technical alignment via human-like social drives

via Alignment Forum [999] — 1. Frontmatter1.1 Backstory for this postAs discussed in Intro to Brain-Like-AGI Safety, I’m working on the technical alignment problem for a hypothetical future “brain-like AGI”, with a particular focus on treating human innate social and moral…

Zac Boring 12 days ago Research

Data filtering works a lot worse than you would expect

via Alignment Forum [999] — This work was largely done during Neel Nanda's MATS 10.0 Exploration Phase. J Rosser and Dohun Lee are co-first authors for this post with equal contribution. Josh Engels and Neel Nanda supervised the project, and provided guidance and feedback…

Zac Boring 15 days ago Research

Pragmatic FDT, and predictors as game theory

via Alignment Forum [999] — Decision theory is back in fashion (defining fashion as "one good post on a good EA blog"). Bentham's Bulldog (BB) has published a case against FDT (functional decision theory), contrasting rationalist enthusiasm with academic scepticism: "Academic…

Zac Boring 17 days ago Research

Constructive Alignment: Governing Preference Dynamics in Human-AI Interaction

via ArXiv cs.AI [5] — Most approaches to AI alignment treat human preferences as fixed targets to be inferred and optimized. This assumption conflicts with extensive empirical evidence showing that preferences are layered, dynamic, and constructed through…

Zac Boring 18 days ago Research

What Drives Interactive Improvement from Feedback?

via ArXiv cs.AI [4] — We study when natural-language feedback produces improvement beyond the gains obtainable from repeated attempts alone. In multi-turn language agent setting, higher final accuracy can reflect useful feedback, but it can also arise from resampling, format…

Zac Boring 18 days ago Research

MIRI Newsletter #126

via MIRI [999] — Announcing: AI StopWatch In our last update, we mentioned we had something new in the works: a dedicated channel for news and analysis about AI. Subscribe to AI StopWatch An experiment from the writers and analysts at MIRI, AI StopWatch posts news and commentary…

Zac Boring 18 days ago Research

Summary: TGT’s 2026 ICML Papers

via MIRI [999] — The International Conference on Machine Learning (ICML), held annually for over forty years, is among the most influential conferences in modern AI research. This year in Seoul, ICML is hosting its second workshop on Technical AI Governance Research (TAIGR), and…

Zac Boring 19 days ago Research

The Two Genie Game: Adoption and Welfare in Audit-Grounded AI Governance

via ArXiv cs.AI [6] — We ask under what conditions an agent with a harm-minimizing policy can displace an approval-seeking (RLHF) agent in a competitive market, and when that policy is sufficient to prevent community harm. We use evolutionary game theory (finite-population…

Zac Boring 19 days ago Research

IMCBench: A benchmark for multimodal LLMs in Image-grounded Medical Conversations

via ArXiv cs.AI [6] — Recent advances in large language models and vision-language models have enabled reasoning over multimodal data, offering opportunities for clinical applications such as decision support and triaging. However, existing medical AI benchmarks are fragmented:…

Zac Boring 22 days ago Research

Deployment Awareness Matters More Than Evaluation Awareness

via Alignment Forum [999] — TL;DREvaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related concept that we claim is more important: deployment awareness, the AI's ability to recognize when it is not…

Zac Boring 22 days ago Research

The Case for Model Forensics

via Alignment Forum [999] — If we had a misalignment warning shot, would we be able to tell?Suppose an AI company catches their model taking an egregious action, like deleting oversight code that monitors its actions. Should they sound the alarm? A key piece of evidence to…

Zac Boring 23 days ago Research

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

via ArXiv cs.AI [3] — Autonomous AI agents may begin to perform consequential, irreversible actions such as clinical prescribing and production software deployment. This paper observes that human institutions have governed powerful autonomous actors not by monitoring their…

Zac Boring 23 days ago Research

Detecting and Controlling Sycophancy with Cascading Linear Features

via ArXiv cs.AI [3] — Interpreting and controlling model behaviors through activation steering methods requires many pairs of contrastive samples that clearly exhibit desired or undesired behavior. These data pairs determine the degree to which interpretability frameworks can…

Zac Boring 24 days ago Research

The Clinician's Veto: Navigating Trust, Liability, and Uncertainty in Autonomous AI Prescribing

via ArXiv cs.AI [3] — Autonomous AI systems are transitioning from advisory to autonomous roles for medication prescriptions. Recent United States bill H.R. 238 and Utah's prescription-renewal pilot both authorize AI to prescribe medications in an agentic capacity. While some…

Zac Boring 24 days ago Research

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

via ArXiv cs.AI [4] — The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central thesis: building great agentic…

Zac Boring 25 days ago Research

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

via ArXiv cs.AI [7] — Mechanistic interpretability has made substantial progress in automatically localizing circuits, but explaining what localized components do remains labor-intensive and difficult to standardize. In this work, we study whether language model (LM) agents can…

Zac Boring 25 days ago Research

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

via ArXiv cs.AI [3] — As AI systems are deployed across increasingly diverse and high-stakes settings, model alignment must generalize beyond the tasks and domains seen during training. This is especially important for reinforcement learning (RL), which can introduce unexpected…