The Quiet Crisis in Site Reliability Engineering
You've seen the headlines. "AI Will Replace 50% of Tech Jobs!" "Automation Makes SREs Obsolete!" The narrative is everywhere in 2026. But here's what they're missing—the real story isn't about replacement. It's about something more insidious, more gradual, and ultimately more damaging to our industry.
AI isn't firing SREs. It's making them less skilled.
I've been in this game for fifteen years. I've watched monitoring evolve from Nagios scripts to today's AI-powered observability platforms. And what I'm seeing now worries me more than any job-loss prediction. We're creating a generation of engineers who can't troubleshoot without AI assistance. They're becoming passengers in their own systems, watching dashboards but not understanding what's happening under the hood.
This article isn't about stopping progress. It's about understanding what we're losing in the rush to automate everything. We'll explore how this deskilling happens, why it matters more than you think, and what you can do to stay sharp in an increasingly automated world.
The Ironies of Automation: A Lesson from 1983
Let's start with some history—because this isn't a new problem. Back in 1983, researcher Lisanne Bainbridge published a paper called "Ironies of Automation." She was studying nuclear power plant operators, but her insights apply perfectly to today's SRE landscape.
Bainbridge noticed something counterintuitive: the more automated a system became, the less skilled the human operators needed to be... until something went wrong. Then suddenly, they needed more skill than ever before. They had to understand complex systems they hadn't manually operated in years. They had to make critical decisions with incomplete information. And they had to do it under extreme pressure.
Sound familiar?
Fast forward to 2026. Your AI-powered observability platform detects an anomaly. It suggests a remediation action. You click "approve." The problem disappears. Great! But what did you actually learn? What connections did you make between system components? What subtle patterns did you notice that might predict future issues?
Probably none. And that's the problem.
Every time you let AI handle the diagnosis, you're skipping the learning opportunity. You're outsourcing the pattern recognition that builds expertise. And you're creating what Bainbridge called "out-of-the-loop unfamiliarity"—you're literally out of the troubleshooting loop.
The Vicious Cycle of AI Dependency
Here's how the deskilling cycle works in practice. It starts innocently enough.
Your team adopts an AI observability tool—maybe something like Datadog's AI-powered root cause analysis or New Relic's anomaly detection. At first, it's amazing. It catches issues you might have missed. It reduces mean time to resolution (MTTR). Your on-call life gets easier. Who wouldn't love that?
But then something subtle happens. You start trusting the AI's recommendations without questioning them. You stop digging into logs manually. You rely on automated dashboards instead of building your own mental models of system behavior. Your troubleshooting muscles begin to atrophy.
Now comes phase two: because you're less skilled at manual troubleshooting, you need the AI tools more. You can't effectively debug without them. So you invest in even more automation. The tools get better, you get more dependent, and the cycle continues.
I've seen this firsthand. A colleague recently couldn't troubleshoot a simple database connection issue because their AI monitoring tool was down. They knew something was wrong with the database, but they had no idea how to trace the connection, check network routes, or examine authentication logs manually. They were completely helpless without their AI assistant.
That's deskilling in action. And it's happening across our industry.
What We're Losing: The Art of Deep Troubleshooting
Let me be specific about what disappears when AI handles too much. These aren't abstract concepts—they're concrete skills that separate good SREs from great ones.
First, you lose pattern recognition across time. AI is great at spotting immediate anomalies, but humans build mental models of how systems behave over weeks, months, even years. I once caught a memory leak that only appeared during specific lunar cycles (true story—it was related to backup schedules that synced with monthly cycles). No AI would connect those dots.
Second, you lose intuition about system interactions. When you've manually traced requests through a distributed system dozens of times, you develop a feel for where bottlenecks hide. You know that Service A talking to Service B through that particular middleware always adds 50ms under load. AI might detect the latency, but it won't give you that gut feeling about why it happens.
Third, and most dangerously, you lose the ability to work with incomplete information. AI tools wait for clear signals. Real production issues often start with ambiguous, contradictory data. The skill isn't in analyzing perfect information—it's in making good decisions with messy, partial data. That's a human skill that atrophies when you always wait for AI confidence scores.
One SRE put it perfectly in the original discussion: "I used to be able to smell a problem coming. Now I just wait for the dashboard to turn red."
The API Integration Angle: Deskilling Through Abstraction
This gets particularly interesting when we look at API integrations and microservices—which is where most of us live in 2026.
Modern observability platforms offer incredible API integration capabilities. You can pipe logs, metrics, and traces from dozens of services into a single pane of glass. The AI correlates events across your entire stack. It's magical... until it isn't.
Here's the issue: these integrations create layers of abstraction that hide implementation details. When your AI tool shows "API latency increased between Service X and Y," what does that actually mean? Is it network congestion? DNS issues? Authentication token expiration? Throttling on the provider side? Serialization problems?
The more integrated your monitoring, the easier it is to see the what but harder to understand the why. You get beautiful dependency graphs but lose touch with the actual mechanics of each connection.
I've worked with teams who couldn't manually test an API endpoint without their automated testing suite. They didn't know how to craft a simple cURL request or interpret raw HTTP responses. When their testing platform had an outage, their development literally stopped. That's API deskilling—knowing the integration but not the protocol.
And don't get me started on AI-generated API documentation. It's convenient, sure. But when the AI summarizes endpoints for you, you miss the nuances. You don't notice that odd parameter that only matters during daylight savings time transitions. You don't spot the rate limit that varies by region. The AI gives you the 80% solution and hides the important 20%.
Breaking the Cycle: Practical Strategies for 2026
So what do we do? Abandon AI tools? Go back to manual log parsing? Of course not. The solution is intentional skill preservation alongside automation adoption.
First, implement "manual Fridays" (or whatever day works). One day a week, troubleshoot without AI assistance. Force yourself to read raw logs. Write custom queries instead of using pre-built dashboards. Trace a request manually through your system. It'll be slower at first. You'll miss things. That's the point—you're rebuilding skills.
Second, practice failure. Regularly break things in staging and troubleshoot without AI tools. Better yet, participate in chaos engineering exercises where you have to diagnose issues with limited information. These aren't just resilience tests—they're skill maintenance exercises.
Third, understand your tools instead of just using them. When your AI suggests a root cause, don't just accept it. Ask: "What data led to this conclusion? What alternative explanations did it consider? What confidence score did it assign, and why?" Treat the AI like a junior engineer whose reasoning you need to verify.
Fourth, maintain "breadth skills" alongside your "depth skills." Yes, specialize in your AI observability platform. But also know how to use lower-level tools. Can you use tcpdump when the network monitoring fails? Can you read a heap dump when the memory analyzer is down? These backup skills prevent total dependency.
One team I worked with keeps a physical notebook (yes, paper) of system quirks and manual troubleshooting steps. When everything is automated, having that analog fallback is surprisingly valuable.
What Tool Vendors Won't Tell You (But Should)
Let's talk honestly about the commercial side of this. Observability and AI tool vendors have every incentive to make their products indispensable. They want you dependent. Their metrics—adoption rates, daily active users, "time saved" calculations—all reward increased dependency.
But some vendors are starting to recognize the deskilling problem. I've seen newer tools that include "learning modes" where they explain their reasoning instead of just giving answers. Others offer "skill builder" exercises that walk engineers through manual troubleshooting steps for common issues.
When evaluating tools in 2026, ask uncomfortable questions:
- "How does this tool help me understand my system better, not just monitor it?"
- "What manual workflows does this replace, and what skills might atrophy as a result?"
- "Can I export raw data to analyze manually if needed?"
- "Does this platform explain its AI's reasoning, or is it a black box?"
The best tools in 2026 won't just automate—they'll educate. They'll make you better at your job, not just more efficient at following their suggestions.
And sometimes, the right tool isn't more AI—it's better data organization. I've seen teams spend millions on AI platforms when what they really needed was consistent logging standards across services. No AI can fix garbage-in-garbage-out.
FAQs from the Trenches
"Isn't this just progress? Should we go back to punch cards?"
Not at all. Automation is good. Efficiency is good. The problem is unbalanced automation that doesn't account for skill preservation. We need both—AI assistance and human expertise. They should complement each other, not replace one with the other.
"My management only cares about MTTR. How do I justify skill-building time?"
Frame it as risk reduction. What's your MTTR when the AI platform has an outage? What's your incident cost when junior engineers can't troubleshoot without AI? Skill diversity is business continuity. I've seen companies pay six-figure emergency consulting fees because their team couldn't troubleshoot without their usual tools.
"I'm already overwhelmed. How do I find time for manual practice?"
Start small. Fifteen minutes a day. Troubleshoot one minor alert manually each week. The key is consistency, not massive time investment. And honestly? Those fifteen minutes often save hours later when you spot patterns faster.
"What about junior engineers who never learned manual troubleshooting?"
This is the biggest challenge. We need to explicitly teach skills that used to be learned through necessity. Create mentorship programs where senior engineers walk juniors through manual debugging. Include lower-level tools in onboarding. Make skill preservation part of your engineering culture.
The Future Isn't Automated—It's Augmented
Here's my prediction for the rest of 2026 and beyond: the most successful SRE teams won't be the most automated. They'll be the most adaptable. They'll use AI as a powerful assistant, not a crutch. They'll maintain the skills to work without it when needed.
The real value in site reliability engineering has always been deep system understanding. It's the ability to connect seemingly unrelated events. It's the intuition born from countless hours of observation and troubleshooting. That's what makes us human. That's what makes us valuable.
AI can handle the predictable, the pattern-based, the repetitive. Let it. That frees us up for the interesting work—the edge cases, the novel failures, the creative solutions.
But we have to stay in practice. We have to maintain our skills. We have to remember how to think, not just how to click "approve" on AI suggestions.
So here's my challenge to you: next time your AI tool detects an issue, don't just follow its recommendation. Ask why. Dig deeper. Trace the path yourself. Rebuild those mental muscles.
Your future self—and your systems—will thank you.