Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Posts by
Zac Boring 3 months ago Research
A Toy Environment For Exploring Reasoning About Reward
via Alignment Forum [999] — tldr: We share a toy environment that we found useful for understanding how reasoning changed over the course of capabilities-focused RL. Over the course of capabilities-focused RL, the model biases more strongly towards reward hints over direct…
Zac Boring 3 months ago Analysis
$1 billion is not enough; OpenAI Foundation must start spending tens of billions each year
via LessWrong AI [6] — OpenAI is now a public benefit corporation, with a charter that demands they use AGI for the benefit of all, and do so safely. To justify this structure to the Attorneys General of Delaware and California, they split off the nonprofit OpenAI Foundation,…
Zac Boring 3 months ago Analysis
Claude Code, Cowork and Codex #6: Claude Code Auto Mode and Full Cowork Computer Use
via Substack Zvi [999] — Whatever else you think about Anthropic’s agentic coding department, they ship.
Zac Boring 3 months ago Industry
Agentic commerce runs on truth and context
via MIT Technology Review [4] — Imagine telling a digital agent, “Use my points and book a family trip to Italy. Keep it within budget, pick hotels we’ve liked before, and handle the details.” Instead of returning a list of links, the agent assembles an itinerary and executes…
Zac Boring 3 months ago Industry
The AI Hype Index: AI goes to war
via MIT Technology Review [4] — AI is at war. Anthropic and the Pentagon feuded over how to weaponize Anthropic’s AI model Claude; then OpenAI swept the Pentagon off its feet with an “opportunistic and sloppy” deal. Users quit ChatGPT in droves. People marched through London in…
Zac Boring 3 months ago Research
Intelligence Inertia: Physical Principles and Applications
via ArXiv cs.AI [3] — While Landauer's principle establishes the fundamental thermodynamic floor for information erasure and Fisher Information provides a metric for local curvature in parameter space, these classical frameworks function effectively only as approximations within…
Zac Boring 3 months ago Industry
Introducing the OpenAI Safety Bug Bounty program
via OpenAI Blog [7] — OpenAI launches a Safety Bug Bounty program to identify AI abuse and safety risks, including agentic vulnerabilities, prompt injection, and data exfiltration.
Zac Boring 3 months ago Analysis
The Fourth World
via LessWrong AI [4] — Is consciousness the last moral world?Imagine trying to explain to a virus why suffering matters.A virus is a simple self-replicating molecule: unsophisticated and arguably not even alive. It has no experience. It just copies itself according to chemical…
Zac Boring 3 months ago Industry
Arm’s first CPU ever will plug into Meta’s AI datacenters later this year
via The Verge AI [4] — After decades of only licensing its chip designs for others to use, UK-based Arm revealed the first chip it's producing on its own, and the first customer. Dubbed the Arm AGI CPU, it's another chip designed for inference, or running the cloud processing…
Zac Boring 3 months ago Analysis
Book Review: Open Socrates (Part 2)
via Substack Zvi [999] — Yesterday I posted Part 1. Read that first. This is Part 2 of 2.
Zac Boring 3 months ago Analysis
The AIXI perspective on AI Safety
via LessWrong AI [5] — I am also discussing something that is still a bit speculative, since we do not yet have ASI. While basic knowledge of AIXI is the only strict prerequisite, I suggest reading cognitive tech from AIT before this post for context.AIXI is often used as a…
Zac Boring 3 months ago Analysis
Measuring and improving coding audit realism with deployment resources
via LessWrong AI [5] — TL;DR We study realism win rate, a metric for measuring how distinguishable Petri audit transcripts are from real deployment interactions. We use it to evaluate the effect of giving the auditor real deployment resources (system prompts, tool definitions,…
Zac Boring 3 months ago Research
Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making
via ArXiv cs.AI [4] — Food security policy formulation in data-scarce regions remains a critical challenge due to limited structured datasets, fragmented textual reports, and demographic bias in decision-making systems. This study proposes ZeroHungerAI, an integrated Natural…
Zac Boring 3 months ago Research
ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics
via ArXiv cs.AI [5] — The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can…
Zac Boring 3 months ago Industry
Nvidia CEO Jensen Huang says ‘I think we’ve achieved AGI’
via The Verge AI [8] — On a Monday episode of the Lex Fridman podcast, Nvidia CEO Jensen Huang made a hot-button statement: "I think we've achieved AGI." AGI, or artificial general intelligence, is a vaguely defined term that has incited a lot of discussion by tech CEOs, tech…
Zac Boring 3 months ago Analysis
Book Review: Open Socrates (Part 1)
via Substack Zvi [999] — These are all important, in their own way, call it a treasure hunt and collect them all…
Zac Boring 3 months ago Industry
The Download: animal welfare gets AGI-pilled, and the White House unveils its AI policy
via MIT Technology Review [4] — This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The Bay Area’s animal welfare movement wants to recruit AI In early February, animal welfare advocates and AI…
Zac Boring 3 months ago Analysis
China declares AGI development to be a part of 5-year plan
via LessWrong AI [4] — The CCP writes in its 15th 5-year plan that it will.Encourage innovation in multimodal, agentic, embodied, and swarm intelligence technologies, and explore development paths for general artificial intelligence.This is translated from the…
Zac Boring 3 months ago Analysis
Finding features in Transformers: Contrastive directions elicit stronger low-level perturbation responses than baselines
via LessWrong AI [6] — Figure 1: Contrastive (difference-of-means, English→Mandarin) feature directions elicit a downstream response at much smaller perturbation magnitudes than SAE directions, which behave similarly to random directions. This holds across multiple models and…
Zac Boring 3 months ago Analysis
Confusion around the term reward hacking
via LessWrong AI [3] — Summary: "Reward hacking" commonly refers to two different phenomena: misspecified-reward exploitation, where RL reinforces undesired behaviors that score highly under the reward function, and task gaming, where models cheat on tasks specified to them…
Live Doom Meter
-- %
0% — We're fine 100% — GG
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...