DevOps, SRE Engineer or Platform Engineer
Role details
Job location
Tech stack
Requirements
observability, while keeping costs in mind. Ensure standardized cross-studio access & security to enable timely data access and ingestion (AWS and Google Cloud). Enable the teams with different environments for testing new setups, tools, without disrupting the day-to-day operations of the team and production workflows. Track usage for all our deployed applications, and identify areas of improvement, making the best use of resources. Keep up with the relevant technologies, best practices, especially related to AI productivity tools, continuously emerging in the industry. What we are looking for 5+ years in the industry as a DevOps, SRE Engineer or Platform Engineer, ideally in gaming, mobile apps, or other high-scale digital products. Strong hands-on experience with Kubernetes in production - not just running workloads on it, but operating it. Cost-aware infrastructure decision-making. Solid Terraform (or OpenTofu) experience, with a track record of keeping IaC sustainable as it grows. Proven experience in delivering data and AI/ML solutions in production for both AWS and a working knowledge of GCP or willingness to come to speed quickly. Bonus if this experience is within the gaming industry. Comfortable owning CI/CD pipelines with common tools (GitHub Actions, GitLab CI, ArgoCD, Jenkins, or similar). Hands-on experience with cloud and Kubernetes security fundamentals, IAM/RBAC, secrets management (ex. Vault, AWS Secrets Manager, External Secrets), network policies, and integrating security checks into CI/CD pipelines. Strong instincts for observability, monitoring, and alerting, you've built dashboards and alerts that teams actually rely on, and you know the difference between a useful page and noise. Hands-on with tools like Prometheus, Grafana, Datadog, CloudWatch, or similar. Solid incident response experience. The current data and AI/ML stack uses open source tools like Airflow, Trino, Spark, and Kubeflow. Familiarity with deploying these tools, as well as tweaking them for improved performance, is a bonus. Understanding of ML Ops best practices and common architectures is also a bonus. Hands-on knowledge of Python and/or other scripting languages. Experience creating infrastructure for both traditional and modern agentic data-intensive systems is a bonus. Focus on innovation, coupled with a mindset of continuous learning and curiosity to explore emerging AI technologies. The successful candidate will have an agile, hands-on approach to prototyping and validation, and ability to Get Stuff Done in a fast-paced environment. Excellent communication and collaboration skills necessary for working effectively with both technical and non-technical teams. Understanding how to drive results with key business stake