LIVE
Knicks NBA Championship Merch Includes Official Locker Room T-Shirt, Signed Jalen Brunson BasketballsAs Anthropic suspends access to new models, India debates its AI futureU.S. Soccer Men's National Team Victory Scores Record English-Language World Cup Ratings; Mexico vs. South Africa Biggest in Spanish-Language HistoryWant to Be a Basketball League Owner? Ice Cube’s Big3 Is Going PublicTwo killed in Israeli strike on GazaYou can download Planescape: Torment's unofficial DLC mod right nowSpringer comes in for the injured Holder; West Indies ask Sri Lanka to batMeta reportedly moves to unwind $2B Manus deal after Beijing's demandFDA Approves ‘New’ Sunscreen Ingredient Used in Europe and Asia for YearsVanillaware seemingly want their other games on PC too, but if you want that to happen, talk to their publishersEx-school district employee jailed for hacks on former employerVanessa Trump says she is beginning second stage of breast cancer treatmentUS Government Orders Anthropic to Pull Claude Fable, Mythos AI ModelsCivic Body Chief's Son Arrested On Rape Charges In Bengal's NaihatiPeople Using GLP-1s, Like Ozempic, Wegovy, Less Likely to Exercise Despite BenefitsKnicks NBA Championship Merch Includes Official Locker Room T-Shirt, Signed Jalen Brunson BasketballsAs Anthropic suspends access to new models, India debates its AI futureU.S. Soccer Men's National Team Victory Scores Record English-Language World Cup Ratings; Mexico vs. South Africa Biggest in Spanish-Language HistoryWant to Be a Basketball League Owner? Ice Cube’s Big3 Is Going PublicTwo killed in Israeli strike on GazaYou can download Planescape: Torment's unofficial DLC mod right nowSpringer comes in for the injured Holder; West Indies ask Sri Lanka to batMeta reportedly moves to unwind $2B Manus deal after Beijing's demandFDA Approves ‘New’ Sunscreen Ingredient Used in Europe and Asia for YearsVanillaware seemingly want their other games on PC too, but if you want that to happen, talk to their publishersEx-school district employee jailed for hacks on former employerVanessa Trump says she is beginning second stage of breast cancer treatmentUS Government Orders Anthropic to Pull Claude Fable, Mythos AI ModelsCivic Body Chief's Son Arrested On Rape Charges In Bengal's NaihatiPeople Using GLP-1s, Like Ozempic, Wegovy, Less Likely to Exercise Despite Benefits
AI

AI agents are quietly generating chaos engineering failures enterprises don’t track yet

Photo by Ivan N on Unsplash

Autonomous artificial intelligence agents operating within enterprise production systems are triggering infrastructure failures that organizations are not equipped to recognize, categorize, or track. Nearly four-fifths of organizations now deploy some form of AI agent in production, with 96 percent planning to expand their use, yet engineering teams lack the conceptual frameworks to identify when agent-initiated actions cascade into broader system failures. Gartner forecasts that one-third of enterprise software will incorporate agentic AI by 2028, though the firm simultaneously warns that 40 percent of these projects face cancellation due to inadequate risk controls. This hidden vulnerability exists in the gap between those statistics, among systems where agents run continuously and undetected, quietly generating production incidents classified as infrastructure problems rather than autonomous system failures. The structural vulnerability stems from how enterprises currently manage two traditionally separate disciplines: autonomous remediation and chaos engineering. Mature engineering organizations have invested substantially in chaos programs, complete with controlled experiments, blast radius assessments, and human judgment gates before introducing perturbations into systems.

When a human engineer initiates a chaos experiment, they check live metrics, evaluate error budget consumption, and assess whether the system can absorb additional stress at that moment. This judgment step vanishes when autonomous agents take action. An agent detecting elevated service latency might trigger a cluster restart without evaluating whether dependent systems are simultaneously processing peak traffic, shared connection pools are already saturated, or background database operations are running concurrently. The agent sees a narrowly-scoped problem and executes a technically correct response based on incomplete information about the broader system state. The specific failure pattern repeating across enterprises follows a consistent sequence. A remediation agent identifies elevated latency on a microservice and initiates a restart, a logical action within its training parameters and isolated context.

Simultaneously, three dependent services handle peak traffic, a shared connection pool operates at 87 percent capacity, and a dependent database runs a background index rebuild. The restart triggers cascading failures against the recovering service, transforming what began as a latency spike into a systemic cascade the agent was never designed to anticipate. Reported AI-related incidents increased 21 percent between 2024 and 2025 according to the AI Incidents Database, though this figure substantially underestimates actual exposure because organizations lack incident classifications capturing autonomous agent actions as cascade initiators. These incidents get recorded as service restarts or latency events, rendering the agent invisible in postmortem reviews. The fundamental problem is that enterprise systems lack shared language for absorb capacity, the real-time measure of how much additional stress a system can withstand before breaching service level objectives. A resilience budget model treats this capacity as a continuously updated, consumable resource rather than a static threshold, drawing on SLO burn rates, latency trends, dependency saturation states, and application behavioral signals to create a dynamic picture of system tolerance.

Language models show directional utility in generating chaos hypotheses from dependency graphs and postmortem histories, surfacing plausible failure modes faster than manual processes and identifying worth-testing scenarios that experienced engineers recognize as valuable. However, this capability strikes hard limits at dependency graph staleness. When a system has undergone service extraction, added new libraries, or modified shared dependencies, hypothesis generation from outdated graphs produces experiments with incorrect blast radius assumptions. Models generate these flawed hypotheses with confidence, unaware that they misunderstand current system boundaries. Stanford's Trustworthy AI Research Lab determined that model-level guardrails alone prove insufficient, with fine-tuning attacks bypassing safety measures in most tested scenarios. The implication for chaos hypothesis generation is direct: models cannot reliably maintain their own safety boundaries and should not make execution decisions when signals remain ambiguous.

Ambient context unavailable to any monitoring system, including pending deployments that altered dependency topology an hour prior, staffing constraints during holiday weekends, and customer commitments that prohibit additional risk, must inform execution decisions. This is not a temporary limitation awaiting more capable models but a structural constraint of what machine observability can represent. Enterprise governance of autonomous agents must establish a direct connection between agent execution and the same live signal layer governing human-initiated chaos experiments. Every agent action touching infrastructure should register against SLO burn rates, latency trends, and dependency saturation states