SRE Engineer
Role details
Job location
Tech stack
Job description
-
SRE Fundamentals & Reliability Engineering Apply core SRE principles including: SLIs, SLOs, SLAs definition and governance Error budgets and reliability trade-offs Incident management and postmortems Partner with SRE L2/L3 teams to improve system reliability and performance
-
Observability Strategy & Tool Recommendation (Core Responsibility) Act as the central point of expertise for Splunk and Dynatrace capabilities Analyze requirements provided by: Application developers SRE L2/L3 engineers Research and determine: Whether requirements can be fulfilled using Splunk, Dynatrace, or both The most efficient, scalable, and cost-effective implementation approach Translate business and technical requirements into tool-specific solutions Recommend best practices, design patterns, and architecture for observability Continuously evaluate new features and enhancements in Splunk and Dynatrace
-
Splunk Engineering Design and optimize Splunk-based logging and monitoring solutions Develop advanced SPL queries, dashboards, and alerts Define log onboarding strategies and data models Ensure data quality, governance, and cost efficiency Provide guidance on when and how to use Splunk effectively
-
Dynatrace Expertise Configure and optimize Dynatrace for APM, RUM, and synthetic monitoring Leverage AI-driven anomaly detection and root cause analysis Map business transactions and critical user journeys Guide teams on best utilization of Dynatrace capabilities
-
Azure Observability Implement and integrate monitoring solutions within Microsoft Azure Work with services such as: Azure App Services, AKS, Azure Functions Azure Monitor, Log Analytics, Application Insights Ensure seamless integration between Azure, Splunk, and Dynatrace
-
Automation & Enablement Develop automation scripts using Python, PowerShell, or Bash Enable self-service observability for engineering teams Integrate monitoring tools with ServiceNow, Jira, or similar platforms Provide documentation, standards, and reusable templates
-
Collaboration & Advisory Act as a trusted advisor to developers and SRE teams Conduct requirement intake sessions and translate them into solutions Provide training and guidance on observability best practices Drive adoption of standardized monitoring approaches across teams
Requirements
5+ years of experience in SRE, DevOps, or Observability Engineering Strong understanding of SRE fundamentals (SLIs, SLOs, error budgets, incident management) Deep hands-on experience with: Splunk (log ingestion, SPL, dashboards, alerting) Dynatrace (APM, RUM, synthetic monitoring) Strong expertise in Microsoft Azure Experience supporting large-scale, customer-facing platforms Proficiency in scripting (Python, PowerShell, or Bash) Strong analytical and problem-solving skills Preferred Qualifications Experience in retail/e-commerce environments Knowledge of microservices and distributed systems Experience with AKS, Docker, and containerized environments Familiarity with additional observability tools (Prometheus, Grafana, ELK) Certifications in Splunk, Dynatrace, or Azure