<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>AI SRE</title><description>Articles, talks, and writing about AI agents in production.</description><link>https://aisre.com/</link><language>en-us</language><item><title>Fixing Claude with Claude: Anthropic reports on AI SRE</title><link>https://www.theregister.com/2026/03/19/anthropic_claude_sre/</link><guid isPermaLink="true">https://www.theregister.com/2026/03/19/anthropic_claude_sre/</guid><description>When Claude produces a postmortem report, it delivers &apos;an 80 percent story that&apos;s pretty, readable and convincing.&apos; Anthropic&apos;s experience using Claude for SRE internally.</description><pubDate>Thu, 19 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Your Data is Made Powerful By Context</title><link>https://charity.wtf/2026/03/09/your-data-is-made-powerful-by-context-so-stop-destroying-it-already-xpost/</link><guid isPermaLink="true">https://charity.wtf/2026/03/09/your-data-is-made-powerful-by-context-so-stop-destroying-it-already-xpost/</guid><description>Agentic workflows will make automated validation techniques easier and more widely used. Context is the key to making data useful.</description><pubDate>Mon, 09 Mar 2026 00:00:00 GMT</pubDate></item><item><title>We Automated Everything Except Knowing What&apos;s Going On</title><link>https://eversole.dev/blog/we-automated-everything/</link><guid isPermaLink="true">https://eversole.dev/blog/we-automated-everything/</guid><description>AI collapsed the cost of building software but not the cost of understanding it. When AI agents outnumber engineers 50-to-1, the gap between deployment speed and comprehension becomes dangerous.</description><pubDate>Mon, 02 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Everyone is a junior engineer in the age of AI</title><link>https://thenewstack.io/hightower-ai-open-source-kubecon/</link><guid isPermaLink="true">https://thenewstack.io/hightower-ai-open-source-kubecon/</guid><description>Kelsey Hightower on AI, open source sustainability, and career resilience for engineers. KubeCon Europe 2026 keynote coverage.</description><pubDate>Sun, 01 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Why Your On-Call Engineer Is Your Most Expensive Bottleneck</title><link>https://medium.com/@pranavkumarshil/why-your-on-call-engineer-is-your-most-expensive-bottleneck-4bb00dff32eb</link><guid isPermaLink="true">https://medium.com/@pranavkumarshil/why-your-on-call-engineer-is-your-most-expensive-bottleneck-4bb00dff32eb</guid><description>AI adoption correlates positively with throughput yet negatively with stability. Organizations are accelerating into more failures, not fewer.</description><pubDate>Sun, 01 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Practical Considerations for AI Incident Reviews</title><link>https://fgj.codes/posts/ai-incident-reviews/</link><guid isPermaLink="true">https://fgj.codes/posts/ai-incident-reviews/</guid><description>LLM-generated incident reviews often fail due to poor input data and misunderstanding of why reviews matter. Incident reviews are fundamentally a socio-technical process, and AI should enhance analyst capacity rather than replace human engagement.</description><pubDate>Sun, 01 Mar 2026 00:00:00 GMT</pubDate></item><item><title>The Picture They Paint of You</title><link>https://ferd.ca/the-picture-they-paint-of-you.html</link><guid isPermaLink="true">https://ferd.ca/the-picture-they-paint-of-you.html</guid><description>AI coding assistants are framed as augmenting engineers. AI SRE tools are framed as replacing them. The disparity reveals how organizations actually value reliability work.</description><pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Building An Elite AI Engineering Culture In 2026</title><link>https://cjroth.com/blog/2026-02-18-building-an-elite-engineering-culture</link><guid isPermaLink="true">https://cjroth.com/blog/2026-02-18-building-an-elite-engineering-culture</guid><description>AI-augmented teams merged 98% more PRs but saw 91% longer review times. Senior engineers get 5x the productivity gains of juniors.</description><pubDate>Wed, 18 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Lots of AI SRE, no AI incident management</title><link>https://surfingcomplexity.blog/2026/02/14/lots-of-ai-sre-no-ai-incident-management/</link><guid isPermaLink="true">https://surfingcomplexity.blog/2026/02/14/lots-of-ai-sre-no-ai-incident-management/</guid><description>AI SRE tools excel at diagnosis and mitigation but lack coordination capabilities. Individual AI agents suffer from fixation bias and can&apos;t maintain the common ground that human teams build during incidents.</description><pubDate>Sat, 14 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Are bugs and incidents inevitable with AI coding agents?</title><link>https://stackoverflow.blog/2026/01/28/are-bugs-and-incidents-inevitable-with-ai-coding-agents/</link><guid isPermaLink="true">https://stackoverflow.blog/2026/01/28/are-bugs-and-incidents-inevitable-with-ai-coding-agents/</guid><description>Analysis of 470 codebases shows AI-generated code produces 1.7x more bugs than human-written code, with 75% more logic errors and 8x more excessive I/O operations.</description><pubDate>Wed, 28 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Bring Back Ops Pride</title><link>https://charity.wtf/2026/01/19/bring-back-ops-pride-xpost/</link><guid isPermaLink="true">https://charity.wtf/2026/01/19/bring-back-ops-pride-xpost/</guid><description>Operations teams got renamed to DevOps, SRE, infrastructure, production engineering, platform engineering. The identity crisis of the people who run production.</description><pubDate>Mon, 19 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Software engineering when the machine writes the code</title><link>https://www.shayon.dev/post/2026/19/software-engineering-when-the-machine-writes-code/</link><guid isPermaLink="true">https://www.shayon.dev/post/2026/19/software-engineering-when-the-machine-writes-code/</guid><description>When production breaks at 2 AM, developers are reverse-engineering code they didn&apos;t write. AI coding erodes the mental models engineers need to debug complex systems under pressure.</description><pubDate>Mon, 19 Jan 2026 00:00:00 GMT</pubDate></item><item><title>How we built an AI SRE agent that investigates like a team of engineers</title><link>https://www.datadoghq.com/blog/building-bits-ai-sre/</link><guid isPermaLink="true">https://www.datadoghq.com/blog/building-bits-ai-sre/</guid><description>Datadog&apos;s Bits AI SRE uses hypothesis-driven investigation: forming hypotheses, testing them against telemetry, and recursively investigating multi-component issues. Early versions drowned in information overload; refined design prioritizes causal connections.</description><pubDate>Mon, 12 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Software Acceleration and Desynchronization</title><link>https://ferd.ca/software-acceleration-and-desynchronization.html</link><guid isPermaLink="true">https://ferd.ca/software-acceleration-and-desynchronization.html</guid><description>On the speed mismatch when software ships faster than teams can absorb the consequences.</description><pubDate>Mon, 05 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Your AI SRE needs better observability, not bigger models</title><link>https://clickhouse.com/blog/ai-sre-observability-architecture</link><guid isPermaLink="true">https://clickhouse.com/blog/ai-sre-observability-architecture</guid><description>An AI agent enters a Chain of Thought loop, firing up to 27 queries in a short time period to map dependencies, check outliers, and validate.</description><pubDate>Thu, 01 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Building internal agents</title><link>https://lethain.com/agents-series/</link><guid isPermaLink="true">https://lethain.com/agents-series/</guid><description>Series on building internal agents. Forward-looking on what changes if AI-enhanced techniques continue to improve.</description><pubDate>Thu, 01 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Human-Centred AI for SRE: Multi-Agent Incident Response without Full Automation</title><link>https://www.infoq.com/news/2026/01/opsworker-ai-sre/</link><guid isPermaLink="true">https://www.infoq.com/news/2026/01/opsworker-ai-sre/</guid><description>A thoughtful, detailed methodology for teams looking to integrate AI agents into their incident workflows while keeping humans in the loop.</description><pubDate>Thu, 01 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Tribal Knowledge Kills On-Call</title><link>https://medium.com/@a_pomorska/tribal-knowledge-kills-on-call-574863bf3eaf</link><guid isPermaLink="true">https://medium.com/@a_pomorska/tribal-knowledge-kills-on-call-574863bf3eaf</guid><description>Tribal knowledge is a single point of failure because it centralizes critical context in humans. Humans are not reliable infrastructure.</description><pubDate>Thu, 01 Jan 2026 00:00:00 GMT</pubDate></item><item><title>End-of-Year Observability Retrospective with Charity Majors</title><link>https://horovits.medium.com/end-of-year-observability-retrospective-with-charity-majors-94f80fff77e8</link><guid isPermaLink="true">https://horovits.medium.com/end-of-year-observability-retrospective-with-charity-majors-94f80fff77e8</guid><description>Observability for AI Workloads: lessons from 2025 and insights for building observable AI systems in production.</description><pubDate>Mon, 01 Dec 2025 00:00:00 GMT</pubDate></item><item><title>Facilitating AI adoption at Imprint</title><link>https://lethain.com/company-ai-adoption/</link><guid isPermaLink="true">https://lethain.com/company-ai-adoption/</guid><description>Real practitioner experience with LLM-tooling and agent adoption. The formula is deep partnership, not &apos;build a platform and they will come.&apos;</description><pubDate>Mon, 01 Dec 2025 00:00:00 GMT</pubDate></item><item><title>AI and the Ironies of Automation</title><link>https://ufried.com/blog/ironies_of_ai_1/</link><guid isPermaLink="true">https://ufried.com/blog/ironies_of_ai_1/</guid><description>Applies Bainbridge&apos;s 1983 automation ironies to modern AI. When AI handles incident response, operators lose the skills needed to intervene when AI fails. Future engineers who never built manual expertise cannot oversee AI systems.</description><pubDate>Fri, 21 Nov 2025 00:00:00 GMT</pubDate></item><item><title>Notes from the 2025 &apos;AI Agents in Production&apos; Conference</title><link>https://markptorres.com/ai_workflows/2025-11-18-ai-agents-in-production-conference-notes</link><guid isPermaLink="true">https://markptorres.com/ai_workflows/2025-11-18-ai-agents-in-production-conference-notes</guid><description>Practitioner notes from MLOps Community conference. Error recovery, context engineering, metrics that predict trust and task completion.</description><pubDate>Tue, 18 Nov 2025 00:00:00 GMT</pubDate></item><item><title>From 4 Hours to 8 Minutes with AI Agents That Transform SRE</title><link>https://www.usenix.org/conference/srecon25emea/presentation/jausovec</link><guid isPermaLink="true">https://www.usenix.org/conference/srecon25emea/presentation/jausovec</guid><description>Building three core agents that form a modern reliability engineering backbone. SREcon25 EMEA talk.</description><pubDate>Wed, 01 Oct 2025 00:00:00 GMT</pubDate></item><item><title>Ongoing Tradeoffs, and Incidents as Landmarks</title><link>https://ferd.ca/ongoing-tradeoffs-and-incidents-as-landmarks.html</link><guid isPermaLink="true">https://ferd.ca/ongoing-tradeoffs-and-incidents-as-landmarks.html</guid><description>Incidents as navigation points for understanding systems. How incidents shape operational knowledge.</description><pubDate>Sat, 20 Sep 2025 00:00:00 GMT</pubDate></item><item><title>The Future of AI in SRE: Preventing Failures, Not Fixing Them</title><link>https://thenewstack.io/the-future-of-ai-in-sre-preventing-failures-not-fixing-them/</link><guid isPermaLink="true">https://thenewstack.io/the-future-of-ai-in-sre-preventing-failures-not-fixing-them/</guid><description>The shift from reactive to preventive AI in site reliability engineering.</description><pubDate>Sun, 01 Jun 2025 00:00:00 GMT</pubDate></item><item><title>The naked truth about AI-assisted coding</title><link>https://krasimirtsonev.com/blog/article/the-naked-truth-about-ai-assisted-coding</link><guid isPermaLink="true">https://krasimirtsonev.com/blog/article/the-naked-truth-about-ai-assisted-coding</guid><description>AI tools optimize for speed of code production. But the hard problems in software have never been about producing code fast enough.</description><pubDate>Sun, 01 Jun 2025 00:00:00 GMT</pubDate></item><item><title>On-Call Is Ruining My Life and Other Tales</title><link>https://www.youtube.com/watch?v=NWcXm9wnH-U</link><guid isPermaLink="true">https://www.youtube.com/watch?v=NWcXm9wnH-U</guid><description>SREcon25 Americas talk on the reality of on-call life for engineering teams.</description><pubDate>Sun, 01 Jun 2025 00:00:00 GMT</pubDate></item><item><title>Stop Building AI Tools Backwards</title><link>https://hazelweakly.me/blog/stop-building-ai-tools-backwards/</link><guid isPermaLink="true">https://hazelweakly.me/blog/stop-building-ai-tools-backwards/</guid><description>A critique of how the industry is approaching AI tooling for infrastructure and operations.</description><pubDate>Sun, 01 Jun 2025 00:00:00 GMT</pubDate></item><item><title>Another observability 3.0 appears on the horizon</title><link>https://charity.wtf/2025/03/24/another-observability-3-0-appears-on-the-horizon/</link><guid isPermaLink="true">https://charity.wtf/2025/03/24/another-observability-3-0-appears-on-the-horizon/</guid><description>Response to Matt Klein&apos;s observability 3.0 post. Forward-looking on where observability is headed.</description><pubDate>Mon, 24 Mar 2025 00:00:00 GMT</pubDate></item><item><title>What Progress In Learning From Incidents Actually Looks Like</title><link>https://www.adaptivecapacitylabs.com/2025/02/28/what-progress-in-learning-from-incidents-actually-looks-like/</link><guid isPermaLink="true">https://www.adaptivecapacitylabs.com/2025/02/28/what-progress-in-learning-from-incidents-actually-looks-like/</guid><description>Keynote from the first Learning From Incidents conference. What real progress looks like in organizational learning from failure.</description><pubDate>Fri, 28 Feb 2025 00:00:00 GMT</pubDate></item><item><title>Observability: the present and future, with Charity Majors</title><link>https://newsletter.pragmaticengineer.com/p/observability-the-present-and-future</link><guid isPermaLink="true">https://newsletter.pragmaticengineer.com/p/observability-the-present-and-future</guid><description>Deep interview on the Pragmatic Engineer newsletter about the future of observability.</description><pubDate>Wed, 01 Jan 2025 00:00:00 GMT</pubDate></item><item><title>LLMs won&apos;t save us</title><link>https://blog.relyabilit.ie/llms-wont-save-us/</link><guid isPermaLink="true">https://blog.relyabilit.ie/llms-wont-save-us/</guid><description>The AI wave is passing over SRE/DevOps tooling. What of genuine value will be left behind? Skeptical, grounded perspective from a Google SRE book co-author.</description><pubDate>Thu, 12 Dec 2024 00:00:00 GMT</pubDate></item><item><title>Learning from Major Incidents: The Opportunities We&apos;re Missing</title><link>https://www.pagerduty.com/blog/incident-management-response/learning-from-major-incidents-the-opportunities-were-missing/</link><guid isPermaLink="true">https://www.pagerduty.com/blog/incident-management-response/learning-from-major-incidents-the-opportunities-were-missing/</guid><description>Post-incident analysis could be more than a tool for SREs. It could be a way to understand how organizations actually operate.</description><pubDate>Mon, 22 Jul 2024 00:00:00 GMT</pubDate></item><item><title>Generative AI is not going to build your engineering team for you</title><link>https://charity.wtf/2024/06/10/generative-ai-is-not-going-to-build-your-engineering-team-for-you/</link><guid isPermaLink="true">https://charity.wtf/2024/06/10/generative-ai-is-not-going-to-build-your-engineering-team-for-you/</guid><description>AI code generation doesn&apos;t solve the production operations problem. It has far more to do with your ability to understand, maintain, and manage software in production over time.</description><pubDate>Mon, 10 Jun 2024 00:00:00 GMT</pubDate></item><item><title>Alert on symptoms, not causes</title><link>https://varoa.net/2024/03/06/alert-on-symptoms-not-causes.html</link><guid isPermaLink="true">https://varoa.net/2024/03/06/alert-on-symptoms-not-causes.html</guid><description>Practitioner essay on alerting philosophy. &apos;I aspire to make operational toil so small that on-call feels like a free bonus.&apos;</description><pubDate>Wed, 06 Mar 2024 00:00:00 GMT</pubDate></item></channel></rss>