Research - pDoom (Page 3)

Zac Boring a month ago Research

An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing

via ArXiv cs.AI [4] — Medical imaging research is increasingly shifting from controlled benchmark evaluation toward real-world clinical deployment. In such settings, applying analytical methods extends beyond model design to require dataset-aware workflow configuration and…

Zac Boring a month ago Research

The paper that killed deep learning theory

via Alignment Forum [999] — Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization.Of course, this is a bit of an exaggeration. No single paper ever…

Zac Boring a month ago Research

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

via ArXiv cs.AI [5] — Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of reasoning, planning, and acting within interactive environments. Despite their growing capability to perform multi-step reasoning and decision-making tasks, internal…

Zac Boring a month ago Research

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

via ArXiv cs.AI [5] — Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe…

Zac Boring a month ago Research

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

via Alignment Forum [999] — This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, and then present what we learned to other participants.Introduction and BackgroundSo. I foolishly thought…

Zac Boring a month ago Research

$50 million a year for a 10% chance to ban ASI

via Alignment Forum [999] — ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the…

Zac Boring a month ago Research

Governing the Agentic Enterprise: A Governance Maturity Model for Managing AI Agent Sprawl in Business Operations

via ArXiv cs.AI [4] — The rapid adoption of agentic AI in enterprise business operations--autonomous systems capable of planning, reasoning, and executing multi-step workflows--has created an urgent governance crisis. Organizations face uncontrolled agent sprawl: the…

Zac Boring a month ago Research

LLM Reasoning Is Latent, Not the Chain of Thought

via ArXiv cs.AI [5] — This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation rather than as faithful surface chain-of-thought (CoT). This matters because claims about faithfulness, interpretability, reasoning…

Zac Boring a month ago Research

Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability

via Alignment Forum [999] — Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT…

Zac Boring a month ago Research

You can only build safe ASI if ASI is globally banned

via Alignment Forum [999] — Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the presumably "unsafe" kind.[1]There are various flavors of “safe” people suggest.Sometimes they suggest building “aligned”…

Zac Boring a month ago Research

Optimizing Earth Observation Satellite Schedules under Unknown Operational Constraints: An Active Constraint Acquisition Approach

via ArXiv cs.AI [4] — Earth Observation (EO) satellite scheduling (deciding which imaging tasks to perform and when) is a well-studied combinatorial optimization problem. Existing methods typically assume that the operational constraint model is fully specified in advance. In…

Zac Boring a month ago Research

Current AIs seem pretty misaligned to me

via Alignment Forum [999] — Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of…

Zac Boring a month ago Research

OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding

via ArXiv cs.AI [4] — Evaluating web usability typically requires time-consuming user studies and expert reviews, which often limits iteration speed during product development, especially for small teams and agile workflows. We present OpenFlo, a user-experience evaluation agent…

Zac Boring a month ago Research

Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes

via Alignment Forum [999] — It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at least the second independent incident in which Anthropic accidentally exposed their model's CoT to the…

Zac Boring a month ago Research

Summary: AI Governance to Avoid Extinction

via MIRI [999] — With AI capabilities rapidly increasing, humans appear close to developing AI systems that are better than human experts across all domains. This raises a series of questions about how the world will—and should—respond. In the research paper AI Governance to…

Zac Boring a month ago Research

Sustained Impact of Agentic Personalisation in Marketing: A Longitudinal Case Study

via ArXiv cs.AI [4] — In consumer applications, Customer Relationship Management (CRM) has traditionally relied on the manual optimisation of static, rule-based messaging strategies. While adaptive and autonomous learning systems offer the promise of scalable personalisation, it…

Zac Boring a month ago Research

OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains

via ArXiv cs.AI [3] — The rise of autonomous AI agents exposes a fundamental flaw in API-centric architectures: probabilistic systems directly execute state mutations without sufficient context, coordination, or safety guarantees. We introduce OpenKedge, a protocol that…

Zac Boring a month ago Research

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

via ArXiv cs.AI [3] — AI-driven symptom analysis systems face persistent challenges in reliability, interpretability, and hallucination. End-to-end generative approaches often lack traceability and may produce unsupported or inconsistent diagnostic outputs in safety-critical…

Zac Boring a month ago Research

MedGemma 1.5 Technical Report

via ArXiv cs.AI [4] — We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical…

Zac Boring a month ago Research

MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems

via ArXiv cs.AI [4] — Multi-objective retrosynthesis planning is a critical chemistry task requiring dynamic balancing of quality, safety, and cost objectives. Language model-based multi-agent systems (MAS) offer a promising approach for this task: leveraging interactions of…