Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Research
Zac Boring a month ago Research
An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing
via ArXiv cs.AI [4] — Medical imaging research is increasingly shifting from controlled benchmark evaluation toward real-world clinical deployment. In such settings, applying analytical methods extends beyond model design to require dataset-aware workflow configuration and…
Zac Boring a month ago Research
The paper that killed deep learning theory
via Alignment Forum [999] — Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al.'s aptly titled Understanding deep learning requires rethinking generalization.Of course, this is a bit of an exaggeration. No single paper ever…
Zac Boring a month ago Research
From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents
via ArXiv cs.AI [5] — Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of reasoning, planning, and acting within interactive environments. Despite their growing capability to perform multi-step reasoning and decision-making tasks, internal…
Zac Boring a month ago Research
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
via ArXiv cs.AI [5] — Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe…
Zac Boring a month ago Research
A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"
via Alignment Forum [999] — This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour, and then present what we learned to other participants.Introduction and BackgroundSo. I foolishly thought…
Zac Boring a month ago Research
$50 million a year for a 10% chance to ban ASI
via Alignment Forum [999] — ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the…
Zac Boring a month ago Research
Governing the Agentic Enterprise: A Governance Maturity Model for Managing AI Agent Sprawl in Business Operations
via ArXiv cs.AI [4] — The rapid adoption of agentic AI in enterprise business operations--autonomous systems capable of planning, reasoning, and executing multi-step workflows--has created an urgent governance crisis. Organizations face uncontrolled agent sprawl: the…
Zac Boring a month ago Research
LLM Reasoning Is Latent, Not the Chain of Thought
via ArXiv cs.AI [5] — This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation rather than as faithful surface chain-of-thought (CoT). This matters because claims about faithfulness, interpretability, reasoning…
Zac Boring a month ago Research
Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability
via Alignment Forum [999] — Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT…
Zac Boring a month ago Research
You can only build safe ASI if ASI is globally banned
via Alignment Forum [999] — Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the presumably "unsafe" kind.[1]There are various flavors of “safe” people suggest.Sometimes they suggest building “aligned”…
Zac Boring a month ago Research
Optimizing Earth Observation Satellite Schedules under Unknown Operational Constraints: An Active Constraint Acquisition Approach
via ArXiv cs.AI [4] — Earth Observation (EO) satellite scheduling (deciding which imaging tasks to perform and when) is a well-studied combinatorial optimization problem. Existing methods typically assume that the operational constraint model is fully specified in advance. In…
Zac Boring a month ago Research
Current AIs seem pretty misaligned to me
via Alignment Forum [999] — Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of…
Zac Boring a month ago Research
OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding
via ArXiv cs.AI [4] — Evaluating web usability typically requires time-consuming user studies and expert reviews, which often limits iteration speed during product development, especially for small teams and agile workflows. We present OpenFlo, a user-experience evaluation agent…
Zac Boring a month ago Research
Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes
via Alignment Forum [999] — It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at least the second independent incident in which Anthropic accidentally exposed their model's CoT to the…
Zac Boring a month ago Research
Summary: AI Governance to Avoid Extinction
via MIRI [999] — With AI capabilities rapidly increasing, humans appear close to developing AI systems that are better than human experts across all domains. This raises a series of questions about how the world will—and should—respond. In the research paper AI Governance to…
Zac Boring a month ago Research
Sustained Impact of Agentic Personalisation in Marketing: A Longitudinal Case Study
via ArXiv cs.AI [4] — In consumer applications, Customer Relationship Management (CRM) has traditionally relied on the manual optimisation of static, rule-based messaging strategies. While adaptive and autonomous learning systems offer the promise of scalable personalisation, it…
Zac Boring a month ago Research
OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains
via ArXiv cs.AI [3] — The rise of autonomous AI agents exposes a fundamental flaw in API-centric architectures: probabilistic systems directly execute state mutations without sufficient context, coordination, or safety guarantees. We introduce OpenKedge, a protocol that…
Zac Boring a month ago Research
SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems
via ArXiv cs.AI [3] — AI-driven symptom analysis systems face persistent challenges in reliability, interpretability, and hallucination. End-to-end generative approaches often lack traceability and may produce unsupported or inconsistent diagnostic outputs in safety-critical…
Zac Boring a month ago Research
MedGemma 1.5 Technical Report
via ArXiv cs.AI [4] — We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical…
Zac Boring a month ago Research
MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems
via ArXiv cs.AI [4] — Multi-objective retrosynthesis planning is a critical chemistry task requiring dynamic balancing of quality, safety, and cost objectives. Language model-based multi-agent systems (MAS) offer a promising approach for this task: leveraging interactions of…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...