Mastering AI-Driven Problem Solving in Engineering with Observability
Stop guessing during outages. Get a step-by-step blueprint for debugging complex systems, from microservice failures to AI model hallucinations.
#1about 2 minutes
Understanding observability and the need for a process
Observability provides insight into system health and performance, addressing the common lack of a methodical process for resolving issues in complex environments.
#2about 2 minutes
Navigating the complexity of highly distributed systems
A real-world example of a distributed trace highlights the challenges of debugging systems with thousands of microservices, databases, and daily deployments.
#3about 4 minutes
Understanding the four core telemetry data types
Effective problem-solving requires leveraging the distinct strengths of metrics, events, logs, and distributed traces to gain a complete picture of system behavior.
#4about 5 minutes
Key data sources and platform capabilities for observability
A comprehensive observability strategy involves monitoring all application layers and utilizing platform features like workloads, change tracking, and AI-driven intelligence.
#5about 1 minute
Prioritizing changes and errors for faster resolution
Insights from a Microsoft Azure study reveal that most production issues stem from software faults or bad data, making rollbacks a common and effective first solution.
#6about 6 minutes
A step-by-step framework for debugging complex systems
Follow a structured process for incident resolution by first checking for changes and errors, then examining local and remote dependencies before using traces to investigate further.
#7about 3 minutes
Strategies for mitigating AI model hallucinations
Combat AI hallucinations by constraining model inputs and outputs, providing additional context through retrieval-augmented generation (RAG), and eventually fine-tuning the model.
#8about 3 minutes
Deciding when to build versus buy LLM solutions
Evaluate the trade-offs between using consumption-based AI tools and building smaller, custom LLMs based on factors like request volume, cost, and data privacy.
Related jobs
Jobs that call for the skills explored in this talk.
Panel Discussion: Responsible AI in Practice - Real-World Examples and ChallengesIntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...
Daniel Cranney
Dev Digest 196: AI Killed DevOps, LLM Political Bias & AI SecurityInside last week’s Dev Digest 196 .
⚖️ Political bias in LLMs
🫣 AI written code causes 1 in 5 security breaches
🖼️ Is there a limit to alternative text on images?
📝 CodeWiki - understand code better
🟨 Long tasks in JavaScript
👻 Scare yourself into n...
Christina Schaireiter
Why Attend a Developer Event?Modern software engineering moves too fast for documentation alone. Attending a world-class event is about shifting from tactical execution to strategic leadership.
Skill Diversification: Break out of your specific tech stack to see how the industry...
Christina Schaireiter
5 Reasons Why Attending Conferences in 2026 Matters More Than You ThinkIt’s 2026, and the “remote vs. office” debate has finally settled into a high-tech hybrid reality. While we’ve perfected the art of shipping production-grade code from decentralized hubs and home setups, something shifted. We realized that while AI c...
From learning to earning
Jobs that call for the skills explored in this talk.