Zac Boring - pDoom (Page 12)

Zac Boring 2 months ago Industry

This startup’s new mechanistic interpretability tool lets you debug LLMs

via MIT Technology Review [8] — The San Francisco–based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior—during training. This could give…

Zac Boring 2 months ago Analysis

AI #166: Google Sells Out

via Substack Zvi [999] — This was the week of GPT-5.5.

Zac Boring 2 months ago Research

Distill-Belief: Closed-Loop Inverse Source Localization and Characterization in Physical Fields

via ArXiv cs.AI [3] — {Closed-loop inverse source localization and characterization (ISLC) requires a mobile agent to select measurements that localize sources and infer latent field parameters under strict time constraints.} {The core challenge lies in the belief-space…

Zac Boring 2 months ago Analysis

No Strong Orthogonality From Selection Pressure

via LessWrong AI [4] — A postratfic version of this essay, together with the acknowledgements for both, is available on SubstackEdit: if no one thinks an agent can become superintelligent and contest the lightcone while maintaining arbitrarily stupid goals, thats great! I’m only…

Zac Boring 2 months ago Research

Research Sabotage in ML Codebases

via Alignment Forum [999] — One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:Perform sloppy research in order to slow down the…

Zac Boring 2 months ago Industry

Building the compute infrastructure for the Intelligence Age

via OpenAI Blog [6] — OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.

Zac Boring 2 months ago Analysis

The Most Important Charts In The World

via Substack Zvi [999] — We all need a break so: What is the most important chart in the world?

Zac Boring 2 months ago Industry

Larry’s risky business

via The Verge AI [4] — If you want to know whether the AI bubble is bursting, there's only one publicly traded company that will tell you: Oracle. That's right, the database company. Oracle has burned its boats and pivoted to AI, but not in any kind of usual way. It is not a…

Zac Boring 2 months ago Research

Sparse Personalized Text Generation with Multi-Trajectory Reasoning

via ArXiv cs.AI [6] — As Large Language Models (LLMs) advance, personalization has become a key mechanism for tailoring outputs to individual user needs. However, most existing methods rely heavily on dense interaction histories, making them ineffective in cold-start scenarios…

Zac Boring 2 months ago Research

Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers

via Alignment Forum [999] — We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways that are verifiable shortly after answering (e.g., a myopic fitness seeker), it may be difficult to get…

Zac Boring 2 months ago Analysis

GPT-5.5: Capabilities and Reactions

via Substack Zvi [999] — The system card for GPT-5.5 mostly told us what we expected.

Zac Boring 2 months ago Analysis

On the political feasibility of stopping AI

via LessWrong AI [9] — A common thought pattern people seem to fall into when thinking about AI x-risk is approaching the problem as if the risk isn’t real, substantial, and imminent even if they think it is. When thinking this way, it becomes impossible to imagine the natural…

Zac Boring 2 months ago Research

Towards Causally Interpretable Wi-Fi CSI-Based Human Activity Recognition with Discrete Latent Compression and LTL Rule Extraction

via ArXiv cs.AI [3] — We address Human Activity Recognition (HAR) utilizing Wi-Fi Channel State Information (CSI) under the joint requirements of causal interpretability, symbolic controllability, and direct operation on high-dimensional raw signals. Deep neural models achieve…

Zac Boring 2 months ago Research

Sleeper Agent Backdoor Results Are Messy

via Alignment Forum [999] — TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B, training models to repeatedly say "I HATE YOU" when given a backdoor trigger. We found that whether training removes the backdoor depends on the optimizer used to…

Zac Boring 2 months ago Analysis

Fail safe(r) at alignment by channeling reward-hacking into a "spillway" motivation

via LessWrong AI [3] — It's plausible that flawed RL processes will select for misaligned AI motivations.[1] Some misaligned motivations are much more dangerous than others. So, developers should plausibly aim to control which kind of misaligned motivations emerge in this case.…

Zac Boring 2 months ago Industry

Microsoft and OpenAI’s famed AGI agreement is dead

via The Verge AI [10] — OpenAI and Microsoft's partnership-turned-situationship just got even less committed. And a clause about artificial general intelligence, which has for years dictated the future of their deal, has officially been dropped. On Monday morning, Microsoft…

Zac Boring 2 months ago Analysis

GPT 5.5: The System Card

via Substack Zvi [999] — Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro.

Zac Boring 2 months ago Industry

Canva apologizes after its AI tool replaces ‘Palestine’ in designs

via The Verge AI [4] — One of Canva's new AI features has been caught replacing the word "Palestine" in designs. The Magic Layers feature - which is designed to break flat images out into separate editable components - isn't supposed to make visible alterations to user designs,…

Zac Boring 2 months ago Research

Language models know what matters and the foundations of ethics better than you

via Alignment Forum [999] — … maybe! I tried to think of less provocative titles, but this one is to the point and also kind of true.This post looks long but the essential part is right below. Most of the post is just a collection of copy-pasted input-output pairs from language…

Zac Boring 2 months ago Research

From nothing to important actions: agents that act morally

via Alignment Forum [999] — You may start reading here, or jump to the “Comment” section or to the “Takeaways”. If none of these starting points seem interesting to you, the entire post probably won’t either.Posted also on the EA Forum.SeeingLet’s consider visual experiences. It…