Zac Boring - pDoom (Page 24)

Zac Boring 4 months ago Industry

Reasoning models struggle to control their chains of thought, and that’s good

via OpenAI Blog [7] — OpenAI introduces CoT-Control and finds reasoning models struggle to control their chains of thought, reinforcing monitorability as an AI safety safeguard.

Zac Boring 4 months ago Analysis

Gemini 3.1 Pro Aces Benchmarks, I Suppose

via Substack Zvi — I’ve been trying to find a slot for this one for a while.

Zac Boring 4 months ago Analysis

Mass Surveillance w/ LLMs is the Default Outcome. Contracts Won't Change That.

via LessWrong AI [3] — What's the best case scenario regarding OpenAI's contract w/ the Department of War (DoW)?We have access to the full contractIt's airtightOAI's engineers are on top of things in case the DoW breaks the contractThere's actual teeth for violationsBut even then, the DoW can simply switch vendors. Use Ge

Zac Boring 4 months ago Analysis

I Had Claude Read Every AI Safety Paper Since 2020, Here's the DB

via LessWrong AI — Click here if you just want to see the Database I made of all[1] AI safety papers written since 2020 and not read the methodology. To some extent the core idea here is to encode as much info from these papers into something small enough that an AI with a specific problem in mind can take in all

Zac Boring 4 months ago Analysis

An Alignment Journal: Coming Soon

via LessWrong AI [9] — tl;dr We’re incubating an academic journal for AI alignment: rapid peer-review of foundational Alignment research that the current publication ecosystem underserves. Key bets: paid attributed review, reviewer-written synthesis abstracts, and targeted automation. Contact us if…

Zac Boring 4 months ago Analysis

A Tale of Three Contracts

via Substack Zvi [2] — The attempt on Friday by Secretary of War Pete Hegsted to label Anthropic as a supply chain risk and commit corporate murder had a variety of motivations.

Zac Boring 4 months ago Industry

Anthropic upgrades Claude’s memory to attract AI switchers

via The Verge AI [2] — Anthropic is making it easier to switch to its Claude AI from other chatbots with an update that brings Claude's memory feature to users on the free plan, along with a new prompt and dedicated tool for importing data from other chatbots. These upgrades could allow users who have been using rivals li

Zac Boring 4 months ago Analysis

War Claude

via LessWrong AI [2] — What a weekend. Two new wars in Asia don't qualify as top news. My first reaction to Hegseth's conflict with Anthropic was along the lines of: I expected an attempt at quasi-nationalization of AI, but not this soon. And I expected it to look like it was managed by national security professionals. He

Zac Boring 4 months ago Industry

OpenAI’s “compromise” with the Pentagon is what Anthropic feared

via MIT Technology Review [4] — On February 28, OpenAI announced it had reached a deal that will allow the US military to use its technologies in classified settings. CEO Sam Altman said the negotiations, which the company began pursuing only after the Pentagon’s public reprimand of Anthropic, were “definitely rushed.” In its anno

Zac Boring 4 months ago Industry

How OpenAI caved to the Pentagon on AI surveillance

via The Verge AI [4] — On Friday evening, amidst fallout from a standoff between the Department of Defense and Anthropic, OpenAI CEO Sam Altman announced that his own company had successfully negotiated new terms with the Pentagon. The US government had just moved to blacklist Anthropic for standing firm on two red lines

Zac Boring 4 months ago Analysis

Secretary of War Tweets That Anthropic is Now a Supply Chain Risk

via Substack Zvi [2] — This is the long version of what happened so far.

Zac Boring 4 months ago Industry

I checked out one of the biggest anti-AI protests ever

via MIT Technology Review [4] — Pull the plug! Pull the plug! Stop the slop! Stop the slop! For a few hours this Saturday, February 28, I watched as a couple hundred anti-AI protesters marched through London’s King’s Cross tech hub, home to the UK headquarters of OpenAI, Meta and Google DeepMind, chanting slogans and waving signs.

Zac Boring 4 months ago Research

How to Design Environments for Understanding Model Motives

via Alignment Forum [5] — Authors: Gerson Kroiz*, Aditya Singh*, Senthooran Rajamanoharan, Neel NandaGerson and Aditya are co-first authors. This work was conducted during MATS 9.0 and was advised by Senthooran Rajamanoharan and Neel Nanda.TL;DRUnderstanding why a model took an action is a key question in AI Safety. It is a

Zac Boring 4 months ago Research

PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents

via ArXiv cs.AI [6] — Large language model (LLM) agents typically rely on reactive decision-making paradigms such as ReAct, selecting actions conditioned on growing execution histories. While effective for short tasks, these approaches often lead to redundant tool usage, un

Zac Boring 4 months ago Research

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

via ArXiv cs.AI [8] — Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact definition. One common definition of AGI is an AI that can do

Zac Boring 4 months ago Research

MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs

via ArXiv cs.AI [3] — Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches s

Zac Boring 4 months ago Analysis

I'm Bearish On Personas For ASI Safety

via LessWrong AI [5] — TL;DRYour base LLM has no examples of superintelligent AI in its training data. When you RL it into superintelligence, it will have to extrapolate to how a superintelligent Claude would behave. The LLM’s extrapolation may not converge optimizing for what humanity would, on…

Zac Boring 4 months ago Research

Schelling Goodness, and Shared Morality as a Goal

via Alignment Forum — Also available in markdown at theMultiplicity.ai/blog/schelling-goodness. This post explores a notion I'll call Schelling goodness. Claims of Schelling goodness are not first-order moral verdicts like "X is good" or "X is bad." They are claims about a class of hypothetical coordination games in the

Zac Boring 4 months ago Analysis

Anthropic and the DoW: Anthropic Responds

via Substack Zvi [2] — The Department of War gave Anthropic until 5:01pm on Friday the 27th to either give the Pentagon ‘unfettered access’ to Claude for ‘all lawful uses,’ or else.

Zac Boring 4 months ago Analysis

New ARENA material: 8 exercise sets on alignment science & interpretability

via LessWrong AI [3] — TLDRThis is a post announcing a lot of new ARENA material I've been working on for a while, which is now available for study here (currently on the alignment-science branch, but planned to be merged into main this Sunday).There's a set of exercises (each one contains about 1-2 days of material) on t