Site Reliability Engineer with Datadog Observability

LTIMindtree Limited
Glen Allen, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 157K

Job location

Glen Allen, United States of America

Tech stack

API
Azure
Business Software
Cloud Computing
Continuous Integration
Distributed Systems
Github
Powershell
Reliability Engineering
Site Reliability Engineering Practices
Datadog
Infrastructure Automation Frameworks
Terraform

Job description

  • Datadog Observability Engineering.
  • Design build and maintain Datadog dashboards for business application and infrastructure visibility single pane of glass.
  • Implement and manage Datadog APM including service maps dependency tracing latency error analysis and performance baselines.
  • Configure synthetic monitoring API browser tests to validate availability user journeys SSLDNS health and external dependencies.
  • Create standardized monitors s and SLOs aligned with SRE best practices signal over noise actionables.
  • Observability Automation IaC.
  • Build observability as code using Terraform to automate Datadog monitors dashboards synthetics and ing templates.
  • Collaborate with CloudOps to integrate Datadog setup during cloud infrastructure provisioning Azure Terraform pipelines.
  • Support Datadog agent automation including installing and configuring agents on Azure VMs using PowerShell and standardized deployment patterns.
  • SRE Practices Operational Excellence.
  • Partner with application teams to onboard services to the observability platform and define SLIsSLOs.
  • Support incident analysis RCA and blameless postmortems using Datadog insights and telemetry.
  • Identify operational toil and drive automation and standardization to reduce manual effort and improve reliability.

Requirements

  • 2-3 years hands on experience in SRE Observability or Production Operations roles.
  • Strong practical experience with Datadog including dashboards monitors APM logs and synthetics.
  • Experience automating infrastructure or observability using Terraform.
  • Experience scripting or automating operational tasks using PowerShell especially for agent installation on VMs.
  • Working knowledge of Azure cloud services and cloud native architectures.
  • Strong troubleshooting skills and a mindset focused on reliability and prevention.

Preferred Qualifications

  • Experience with observability platform rollouts or migration from tools such as App Insights or Logic Monitor to Datadog.
  • Experience working with GitHub GitHub Actions or similar CICD tools for automation workflows.
  • Familiarity with SRE concepts such as SLIsSLOs error budgets and incident response frameworks.
  • Exposure to containerized environments AKSKubernetes and distributed systems observability., Mandatory Skills : DATA Dog

About the company

LTM is an AI-centric global technology services company and the Business Creativity partner to the world's largest and most disruptive enterprises. We bring human insights and intelligent systems together to help clients create greater value at the intersection of technology and domain expertise. Our capabilities span integrated operations, transformation, and business AI - enabling new ways of working, new productivity paradigms, and new roads to value. Together with over 87,000 employees across 40 countries and our global network of partners, LTM - a Larsen & Toubro company - owns business outcomes for our clients, helping them not just outperform the market, but to Outcreate it. Please also note that neither LTM nor any of its authorized recruitment agencies/partners charge any candidate registration fee or any other fees from talent (candidates) towards appearing for an interview or securing employment/internship. Candidates shall be solely responsible for verifying the credentials, © 2026 Careerjet All rights reserved

Apply for this position