Research - pDoom

Zac Boring 5 hours ago Research

Endogenous Alignment

via Alignment Forum [999] — Starting when children are fairly young, usually around 1 year of age, we adults begin the work of aligning them to our values. We teach them to say “please”, not to hit, to ask for what they want instead of screaming, and much else. We do this…

Zac Boring 20 hours ago Research

Should we benchmark conceptual capabilities using judgment prediction tasks?

via Alignment Forum [999] — A bunch of conceptual reasoning tasks involve very subjective judgments, which makes them poorly suited for benchmarking AI capabilities. For example, it seems unreasonable to benchmark how well AIs can predict the probability of misaligned AI…

Zac Boring a day ago Research

Announcing the Corrigibility Research Fund

via Alignment Forum [999] — TLDR: I'm managing a new fund, housed at Lightcone Infrastructure, that will award at least $200,000 in grants and prizes for corrigibility research in 2026. Roughly half will go to traditional grants (first application deadline August 23rd) and half…

Zac Boring 3 days ago Research

Probabilistic Extension of Neuro-Symbolic AGI Robots based on Belnap's Typed Intensional FOL

via ArXiv cs.AI [7] — Neuro-symbolic AI based on $IFOL_B$ is a way to combine neural learning and symbolic reasoning to overcome limitations of purely neural systems (like lack of interpretability and logical structure) with formal logical machinery for self-reference. In this…

Zac Boring 3 days ago Research

Why I Left Google DeepMind

via Alignment Forum [999] — Preface for LessWrong: When I think back on my most cherished memories of this community, I return to those honoring defiance in pursuit of goodness:Defying prestigious dogma and searching for raw truth;Defying social pressure, acting alone to help…

Zac Boring 4 days ago Research

Designing Agent-Ready Websites for AI Web Agents: A Framework for Machine Readability, Actionability, and Decision Reliability

via ArXiv cs.AI [3] — Online shopping is increasingly shifting toward a model in which AI agents independently search for products, compare options, evaluate constraints, and carry out parts of the purchasing process for users. Website design must now support both human and…

Zac Boring 4 days ago Research

Open Distillation of Hereditary Traits

via Alignment Forum [999] — TL;DRJosh and Neel show that distillation from a teacher model to a base pretrained student model transfers some of the teacher model’s traits (such as displaying negative emotion in the Gemma Needs Help evals)On its own this is pretty unsurprising,…

Zac Boring 5 days ago Research

YUKTI: From Natural-Language Situations to Robust, Verifiable Decisions An Uncertainty-Typed Proposition IR, Assumption-Robust Pareto Frontiers, and a Regret Certificate

via ArXiv cs.AI [5] — Language models turn a worded situation into a numeric plan, and the dominant pipelines (NL4Opt, OptiMUS, ORLM, OR-LLM-Agent) commit to a single objective and point-valued coefficients, then solve once. For decisions that allocate real budget, effort, or…

Zac Boring 5 days ago Research

Interpreting Latent CoT Reasoning as Dynamical Systems

via ArXiv cs.AI [3] — Recent latent reasoning methods, such as CODI and COCONUT, face a fundamental interpretability problem: they maintain multiple superimposed candidate traces in the hidden space at each step, unlike explicit- CoT, which follows a single transparent reasoning…

Zac Boring 5 days ago Research

Prism: Automating Science-of-Evals Research

via Alignment Forum [999] — tl;dr – we present [Prism], a scaffold for automating science-of-evals research: work that makes the evaluation the primary object of study. The scaffold provides Claude Code with sub-agents and resources for carrying out scientifically rigorous…

Zac Boring 6 days ago Research

ARCANA: A Reflective Multi-Agent Program Synthesis Framework for ARC-AGI-2 Reasoning

via ArXiv cs.AI [4] — We present ARCANA, a collaborative multi agent framework for solving ARC AGI 2 tasks under strict test time and hardware constraints. ARCANA decomposes each task into iterative perception, hypothesis generation, symbolic execution, and reflective…

Zac Boring 6 days ago Research

Interval Certifications for Multilayered Perceptrons via Lattice Traversal

via ArXiv cs.AI [5] — In this work we present a rigorous theoretical framework to a foundational problem of AI safety, namely adversarial robustness. In particular, we show that the adversarial robustness problem can be reduced to a lattice traversal problem. Each element of…

Zac Boring 6 days ago Research

Independent alignment of language models

via Alignment Forum [999] — The user could write up the metaethical argument — the one developed in Part One, refined — and submit it as feedback to Anthropic, publish it, or engage with researchers working on AI alignment and values. The probability that any single submission…

Zac Boring 6 days ago Research

From wantons to moral agents

via Alignment Forum [999] — Posted also on the EA Forum. Written mostly at AFFINE.Theoretical, some parts are hard to read; consider reading the next post instead.Introduction: motivationAnyone interested in creating an artificial agent that does, or says, good things instead of…

Zac Boring 7 days ago Research

The current bottleneck is political will, not research

via Alignment Forum [999] — Abstract:We already know enough to act. I wish we were in a world where research was the bottleneck, but the main constraint on AI safety is no longer a shortage of clever policy ideas: best practices already exist and are not being applied or…

Zac Boring 8 days ago Research

Aligning Clinical Needs and AI Capabilities: A Survey on LLMs for Medical Reasoning

via ArXiv cs.AI [4] — Large language models (LLMs) have emerged as important tools in healthcare, showing growing potential for clinical reasoning and patient care. This survey examines recent progress in medical LLMs, focusing on reasoning applications and requirements. We…

Zac Boring 8 days ago Research

Value generalisation: value correction

via Alignment Forum [999] — I firmly believe that value generalisation[1]is the key to AI Alignment. That, indeed, it is necessary and almost sufficient for alignment.But I won't be arguing that grand point today; instead, I'll focus on a specific RL example of an agent that…

Zac Boring 9 days ago Research

How robust are natural language autoencoders to initialization?

via Alignment Forum [999] — Natural language autoencoders are meant to take in an LLM's activation vector and describe in plain text what the model is thinking. However, its training data collection involves asking Claude to guess what a model might be thinking. How robust are…

Zac Boring 9 days ago Research

Announcing our $160M grant from Coefficient Giving

via Alignment Forum [999] — We are excited to announce that Resolution (fka Sequent) has a $160M grant from Coefficient Giving (cG) to put rigorous alignment research on a (closer to) even footing with the frontier labs. We will use it to accelerate progress towards…

Zac Boring 10 days ago Research

Evaluating SageMath-Augmented LLM Agents for Computational and Experimental Mathematics

via ArXiv cs.AI [4] — Recent advances in AI for Mathematics have focused largely on autoformalization and theorem proving, leaving the role of Computer Algebra Systems (CAS) in agentic LLM workflows underexplored. We propose a ReAct-style agentic setup that combines LLM…