Zac Boring - pDoom (Page 19)

Zac Boring 3 months ago Research

A Toy Environment For Exploring Reasoning About Reward

via Alignment Forum [999] — tldr: We share a toy environment that we found useful for understanding how reasoning changed over the course of capabilities-focused RL. Over the course of capabilities-focused RL, the model biases more strongly towards reward hints over direct…

Zac Boring 3 months ago Analysis

$1 billion is not enough; OpenAI Foundation must start spending tens of billions each year

via LessWrong AI [6] — OpenAI is now a public benefit corporation, with a charter that demands they use AGI for the benefit of all, and do so safely. To justify this structure to the Attorneys General of Delaware and California, they split off the nonprofit OpenAI Foundation,…

Zac Boring 3 months ago Analysis

Claude Code, Cowork and Codex #6: Claude Code Auto Mode and Full Cowork Computer Use

via Substack Zvi [999] — Whatever else you think about Anthropic’s agentic coding department, they ship.

Zac Boring 3 months ago Industry

Agentic commerce runs on truth and context

via MIT Technology Review [4] — Imagine telling a digital agent, “Use my points and book a family trip to Italy. Keep it within budget, pick hotels we’ve liked before, and handle the details.” Instead of returning a list of links, the agent assembles an itinerary and executes…

Zac Boring 3 months ago Industry

The AI Hype Index: AI goes to war

via MIT Technology Review [4] — AI is at war. Anthropic and the Pentagon feuded over how to weaponize Anthropic’s AI model Claude; then OpenAI swept the Pentagon off its feet with an “opportunistic and sloppy” deal. Users quit ChatGPT in droves. People marched through London in…

Zac Boring 3 months ago Research

Intelligence Inertia: Physical Principles and Applications

via ArXiv cs.AI [3] — While Landauer's principle establishes the fundamental thermodynamic floor for information erasure and Fisher Information provides a metric for local curvature in parameter space, these classical frameworks function effectively only as approximations within…

Zac Boring 3 months ago Industry

Introducing the OpenAI Safety Bug Bounty program

via OpenAI Blog [7] — OpenAI launches a Safety Bug Bounty program to identify AI abuse and safety risks, including agentic vulnerabilities, prompt injection, and data exfiltration.

Zac Boring 3 months ago Analysis

The Fourth World

via LessWrong AI [4] — Is consciousness the last moral world?Imagine trying to explain to a virus why suffering matters.A virus is a simple self-replicating molecule: unsophisticated and arguably not even alive. It has no experience. It just copies itself according to chemical…

Zac Boring 3 months ago Industry

Arm’s first CPU ever will plug into Meta’s AI datacenters later this year

via The Verge AI [4] — After decades of only licensing its chip designs for others to use, UK-based Arm revealed the first chip it's producing on its own, and the first customer. Dubbed the Arm AGI CPU, it's another chip designed for inference, or running the cloud processing…

Zac Boring 3 months ago Analysis

Book Review: Open Socrates (Part 2)

via Substack Zvi [999] — Yesterday I posted Part 1. Read that first. This is Part 2 of 2.

Zac Boring 3 months ago Analysis

The AIXI perspective on AI Safety

via LessWrong AI [5] — I am also discussing something that is still a bit speculative, since we do not yet have ASI. While basic knowledge of AIXI is the only strict prerequisite, I suggest reading cognitive tech from AIT before this post for context.AIXI is often used as a…

Zac Boring 3 months ago Analysis

Measuring and improving coding audit realism with deployment resources

via LessWrong AI [5] — TL;DR We study realism win rate, a metric for measuring how distinguishable Petri audit transcripts are from real deployment interactions. We use it to evaluate the effect of giving the auditor real deployment resources (system prompts, tool definitions,…

Zac Boring 3 months ago Research

Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making

via ArXiv cs.AI [4] — Food security policy formulation in data-scarce regions remains a critical challenge due to limited structured datasets, fragmented textual reports, and demographic bias in decision-making systems. This study proposes ZeroHungerAI, an integrated Natural…

Zac Boring 3 months ago Research

ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics

via ArXiv cs.AI [5] — The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can…

Zac Boring 3 months ago Industry

Nvidia CEO Jensen Huang says ‘I think we’ve achieved AGI’

via The Verge AI [8] — On a Monday episode of the Lex Fridman podcast, Nvidia CEO Jensen Huang made a hot-button statement: "I think we've achieved AGI." AGI, or artificial general intelligence, is a vaguely defined term that has incited a lot of discussion by tech CEOs, tech…

Zac Boring 3 months ago Analysis

Book Review: Open Socrates (Part 1)

via Substack Zvi [999] — These are all important, in their own way, call it a treasure hunt and collect them all…

Zac Boring 3 months ago Industry

The Download: animal welfare gets AGI-pilled, and the White House unveils its AI policy

via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The Bay Area’s animal welfare movement wants to recruit AI In early February, animal welfare advocates and AI…

Zac Boring 3 months ago Analysis

China declares AGI development to be a part of 5-year plan

via LessWrong AI [4] — The CCP writes in its 15th 5-year plan that it will.Encourage innovation in multimodal, agentic, embodied, and swarm intelligence technologies, and explore development paths for general artificial intelligence.This is translated from the…

Zac Boring 3 months ago Analysis

Finding features in Transformers: Contrastive directions elicit stronger low-level perturbation responses than baselines

via LessWrong AI [6] — Figure 1: Contrastive (difference-of-means, English→Mandarin) feature directions elicit a downstream response at much smaller perturbation magnitudes than SAE directions, which behave similarly to random directions. This holds across multiple models and…

Zac Boring 3 months ago Analysis

Confusion around the term reward hacking

via LessWrong AI [3] — Summary: "Reward hacking" commonly refers to two different phenomena: misspecified-reward exploitation, where RL reinforces undesired behaviors that score highly under the reward function, and task gaming, where models cheat on tasks specified to them…