Senior DevOps Engineer

Tripledot Studios
Barcelona, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
Barcelona, Spain

Tech stack

Kubernetes Security
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Cloud Computing
Computer Networks
Continuous Integration
Data Security
DevOps
Github
Identity and Access Management
Python
Key Management
Machine Learning
Nagios
Octopus Deploy
Open Source Technology
Role-Based Access Control
Prometheus
Azure
Systems Integration
Datadog
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Grafana
Spark
Gitlab-ci
Kubernetes
Machine Learning Operations
Cloudwatch
Terraform
Software Version Control
Jenkins

Job description

  • Improve and maintain a scalable, speedy and reliable data and ML platform to support AI/ML initiatives within group AI, ensuring models move seamlessly from research to production.
  • Support group IT to provide reliable access to open source AI models and ensure safe reliable access to AI productivity tools.
  • Create and maintain proper monitoring and alerting tools to ensure our systems can provide the correct SLA and SLOs defined by the stakeholders.
  • Implement and advocate for engineering best practices, including CI/CD, infrastructure as code like Terraform, usage of version control, testing, observability, while keeping costs in mind.
  • Ensure standardized cross-studio access & security to enable timely data access and ingestion (AWS and Google Cloud).
  • Enable the teams with different environments for testing new setups, tools, without disrupting the day-to-day operations of the team and production workflows.
  • Track usage for all our deployed applications, and identify areas of improvement, making the best use of resources.
  • Keep up with the relevant technologies, best practices, especially related to AI productivity tools, continuously emerging in the industry.

Requirements

Do you have experience in Terraform?, * 5+ years in the industry as a DevOps, SRE Engineer or or Platform Engineer, ideally in gaming, mobile apps, or other high-scale digital products.

  • Strong hands-on experience with Kubernetes in production - not just running workloads on it, but operating it. Cost-aware infrastructure decision-making.
  • Solid Terraform (or OpenTofu) experience, with a track record of keeping IaC sustainable as it grows.
  • Proven experience in delivering data and AI/ML solutions in production for both AWS and a working knowledge of GCP or willingness to come to speed quickly. Bonus if this experience is within the gaming industry.
  • Comfortable owning CI/CD pipelines with common tools (GitHub Actions, GitLab CI, ArgoCD, Jenkins, or similar).
  • Hands-on experience with cloud and Kubernetes security fundamentals, IAM/RBAC, secrets management (ex. Vault, AWS Secrets Manager, External Secrets), network policies, and integrating security checks into CI/CD pipelines.
  • Strong instincts for observability, monitoring, and alerting, you've built dashboards and alerts that teams actually rely on, and you know the difference between a useful page and noise. Hands-on with tools like Prometheus, Grafana, Datadog, CloudWatch, or similar. Solid incident response experience.
  • The current data and AI/ML stack uses open source tools like Airflow, Trino, Spark, and Kubeflow. Familiarity with deploying these tools, as well as tweaking them for improved performance, is a bonus. Understanding of ML Ops best practices and common architectures is also a bonus.
  • Hands-on knowledge of Python and/or other scripting languages.
  • Experience creating infrastructure for both traditional and modern agentic data-intensive systems is a bonus.
  • Focus on innovation, coupled with a mindset of continuous learning and curiosity to explore emerging AI technologies. The successful candidate will have an agile, hands-on approach to prototyping and validation, and ability to Get Stuff Done in a fast-paced environment.
  • Excellent communication and collaboration skills necessary for working effectively with both technical and non-technical teams. Understanding how to drive results with key business stakeholders.

Benefits & conditions

  • You will be part of a fun mobile gaming company aiming to embrace the future of AI-driven creativity and exploring where the industry is moving.
  • You will be instrumental in shaping the backbone of the AI/ML and IT systems that will power solutions that will spread throughout the whole group.
  • You will operate in an environment that values an experimental mindset, focusing on learning opportunities and pioneering generative game creation.

Working at Tripledot

  • 25 days paid holiday in addition to bank holidays to relax and refresh throughout the year
  • Hybrid Working: We work in the office 3 days a week, Tuesdays and Wednesdays, and a third day of your choice.
  • 20 days remote working: Work from anywhere in the world, or use the time to cover mandatory office days to WFH, 20 days of the year.
  • Regular company events and rewards: Join in regular events and rewards that celebrate cultural events, our achievements and our team spirit.
  • Private Medical Cover: Have peace of mind with private medical cover, ensuring your health is in good hands.
  • Life & Critical Illness Cover: Protect your future with our life and critical illness cover.
  • Family Forming Support: Receive vital support on your family forming/ fertility journey with our support program [subject to policy]
  • Employee Assistance Program: Access confidential support anytime through our Employee Assistance Program.
  • Sport Compensation: Stay fit and active with our sport compensation benefit.
  • Meal and Transport Vouchers: Save on meals and transport with our convenient vouchers.
  • English & Spanish Classes: Enhance your English and Spanish skills with our provided language classes.
  • Continuous Professional Development: Propel your career with continuous opportunities for professional development.

About the company

Tripledot Studios is one of the largest independent mobile games companies in the world. We are a multi-award-winning organisation, with a global 2,500+ strong team across 12 studios. Our expanded portfolio includes some of the biggest titles in mobile gaming, collectively reaching top chart positions around the world and engaging over 25 million daily active users. Tripledot's guiding principle is that when people love what they do, what they do will be loved by others. We're building a company we're proud of. One filled with driven, incredibly smart and detail-orientated people, who LOVE making games. Our ambition is to be the most successful games company in the world, and we're just getting started., You will build and maintain the high-performance infrastructure required to train and deploy AI models that impact millions of players in real-time, as well as improve productivity of everyone at Tripledot. You will serve as a bridge between the technical solutions that are created by our teams and our live game engines, as well as creating and maintaining infrastructure for internal IT needs. Within the group AI functions you'll be working with other AI / ML engineers, data engineers, analysts and product owners. Within the various studios and other central teams, you will interact with data, engineering, and product teams. Within group IT, you'll partner with Engineering, TechOps, and Security to deliver the infrastructure and tooling that powers our central business functions: gaming, finance, marketing, legal, people ops, and beyond. The first initiative you'll be taking part of is the expansion of a data/ML Platform to support ML engineers and data scientists to easily deploy their solutions, and enabling delivery of key projects like LTV and Ads Optimization.

Apply for this position