AIOps Engineer - INTL Mexico
Role details
Job location
Tech stack
Job description
Build AI agentic flows that automate incident response and operational tasks.
Use LLMs to analyze alerts, logs, and SOPs, then decide the correct actions without human involvement.
Replace repetitive, manual incident work with automation that follows existing processes.
Improve system reliability through better alerting, observability, and automated remediation.
Integrate AI-driven automation with monitoring, logging, and cloud services.
Partner with SRE, DevOps, and platform teams to safely deploy and scale automation.
Continuously improve automation based on real production signals and outcomes.
$24-$28/hour
Requirements
Strong understanding of LLMs and how they can be used to make decisions, trigger actions, and automate workflows.
Proven ability to turn manual SOPs and runbooks into automation, not just follow them.
Strong experience with automation using Python (Go is also acceptable).
Experience working in incident response, reliability, SRE, DevOps, or platform operations environments.
Comfort working in cloud-native systems, especially GCP.
Experience with production observability - knowing what signals matter, what's breaking, and why.
Required Technical Experience
Google Cloud Platform (GCP)
Automation: Python (Go is acceptable)
Observability
Google Managed Prometheus (GMP)
Grafana Enterprise
Log configuration and analysis
Google Cloud Services
Kubernetes (GKE)
Cloud Logging
BigQuery
Pub/Sub
Google Cloud Storage
General understanding of Google networking
Developer Tools
GitHub Copilot
GitHub Copilot for workflows