Research
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
via ArXiv cs.AI [4] — Agentic reinforcement learning (ARL) has rapidly gained attention as a promising paradigm for training agents to solve complex, multi-step interactive tasks. Despite encouraging early results, ARL remains highly unstable, often leading to training coll
Task-Aware Exploration via a Predictive Bisimulation Metric
via ArXiv cs.AI — Accelerating exploration in visual reinforcement learning under sparse rewards remains challenging due to the substantial task-irrelevant variations. Despite advances in intrinsic exploration, many methods either assume access to low-dimensional states
Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems
via ArXiv cs.AI [5] — Traditional AI alignment primarily focuses on individual model outputs; however, autonomous agents in long-horizon workflows require sustained reliability across entire interaction trajectories. We introduce APEMO (Affect-aware Peak-End Modulation for
El Agente Gr\'afico: Structured Execution Graphs for Scientific Agents
via ArXiv cs.AI [6] — Large language models (LLMs) are increasingly used to automate scientific workflows, yet their integration with heterogeneous computational tools remains ad hoc and fragile. Current agentic approaches often rely on unstructured text to manage context a
Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge
via ArXiv cs.AI — Language models exhibit fundamental limitations -- hallucination, brittleness, and lack of formal grounding -- that are particularly problematic in high-stakes specialist fields requiring verifiable reasoning. I investigate whether formal domain ontolo
How will we do SFT on models with opaque reasoning?
via Alignment Forum [1] — Current LLMs externalize lots of their reasoning in human interpretable language. This reasoning is sometimes unfaithful, sometimes strange and concerning, and LLMs can do somewhat impressive reasoning without using CoT, but my overall impression is that CoT currently is a…
Live Doom Meter
--
%
0% — We're fine
100% — GG
The Doom Meter is a composite score derived from prediction markets and feed sentiment, updated daily.
70%
Prediction Markets
Weighted average of Manifold Markets questions on AI catastrophe, AGI timelines, expert surveys, and key figures. Direct doom indicators weighted higher than indirect capability markers.
30%
Feed Sentiment
Percentage of recent headlines containing high-alarm keywords (existential risk, catastrophe, extinction). Higher alarm density = higher score.
This is not a scientific estimate of existential risk. It is an opinionated, transparent signal — a vibes-based thermometer for AI doom discourse.
P(Doom) Scoreboard
0%25%50%75%100%
Loading estimates...