Senior DevOps Engineer
Tripledot Studios
Berlin, Germany
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Berlin, Germany
Tech stack
Kubernetes Security
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Cloud Computing
Computer Networks
Continuous Integration
Data Security
DevOps
Github
Identity and Access Management
Python
Key Management
Machine Learning
Nagios
Octopus Deploy
Open Source Technology
Role-Based Access Control
Prometheus
Azure
Systems Integration
Datadog
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Grafana
Spark
Gitlab-ci
Kubernetes
Machine Learning Operations
Cloudwatch
Terraform
Software Version Control
Jenkins
Job description
- Improve and maintain a scalable, speedy and reliable data and ML platform to support AI/ML initiatives within group AI, ensuring models move seamlessly from research to production.
- Support group IT to provide reliable access to open source AI models and ensure safe reliable access to AI productivity tools.
- Create and maintain proper monitoring and alerting tools to ensure our systems can provide the correct SLA and SLOs defined by the stakeholders.
- Implement and advocate for engineering best practices, including CI/CD, infrastructure as code like Terraform, usage of version control, testing, observability, while keeping costs in mind.
- Ensure standardized cross-studio access & security to enable timely data access and ingestion (AWS and Google Cloud).
- Enable the teams with different environments for testing new setups, tools, without disrupting the day-to-day operations of the team and production workflows.
- Track usage for all our deployed applications, and identify areas of improvement, making the best use of resources.
- Keep up with the relevant technologies, best practices, especially related to AI productivity tools, continuously emerging in the industry.
Requirements
Do you have experience in Terraform?, * 5+ years in the industry as a DevOps, SRE Engineer or or Platform Engineer, ideally in gaming, mobile apps, or other high-scale digital products.
- Strong hands-on experience with Kubernetes in production - not just running workloads on it, but operating it. Cost-aware infrastructure decision-making.
- Solid Terraform (or OpenTofu) experience, with a track record of keeping IaC sustainable as it grows.
- Proven experience in delivering data and AI/ML solutions in production for both AWS and a working knowledge of GCP or willingness to come to speed quickly. Bonus if this experience is within the gaming industry.
- Comfortable owning CI/CD pipelines with common tools (GitHub Actions, GitLab CI, ArgoCD, Jenkins, or similar).
- Hands-on experience with cloud and Kubernetes security fundamentals, IAM/RBAC, secrets management (ex. Vault, AWS Secrets Manager, External Secrets), network policies, and integrating security checks into CI/CD pipelines.
- Strong instincts for observability, monitoring, and alerting, you've built dashboards and alerts that teams actually rely on, and you know the difference between a useful page and noise. Hands-on with tools like Prometheus, Grafana, Datadog, CloudWatch, or similar. Solid incident response experience.
- The current data and AI/ML stack uses open source tools like Airflow, Trino, Spark, and Kubeflow. Familiarity with deploying these tools, as well as tweaking them for improved performance, is a bonus. Understanding of ML Ops best practices and common architectures is also a bonus.
- Hands-on knowledge of Python and/or other scripting languages.
- Experience creating infrastructure for both traditional and modern agentic data-intensive systems is a bonus.
- Focus on innovation, coupled with a mindset of continuous learning and curiosity to explore emerging AI technologies. The successful candidate will have an agile, hands-on approach to prototyping and validation, and ability to Get Stuff Done in a fast-paced environment.
- Excellent communication and collaboration skills necessary for working effectively with both technical and non-technical teams. Understanding how to drive results with key business stakeholders.
Benefits & conditions
- You will be part of a fun mobile gaming company aiming to embrace the future of AI-driven creativity and exploring where the industry is moving.
- You will be instrumental in shaping the backbone of the AI/ML and IT systems that will power solutions that will spread throughout the whole group.
- You will operate in an environment that values an experimental mindset, focusing on learning opportunities and pioneering generative game creation.
Working at Tripledot
- 25 days holiday: Enjoy 20 days of paid holiday plus an additional 5 days per annum, in addition to bank holidays to relax and refresh throughout the year.
- Continuous Professional Development: Propel your career with continuous opportunities for professional development.
- Supplemental Health Insurance: Continentale: annual budget of 1200 eur - reimbursement for healthcare outside of public plans (glasses, messages, dental treatments etc.)
- Public Transportation reimbursement: 100% of public transportation to the office via BVG ticket is covered
- Daily Free Lunch: Order free food from Wolt when at the office
- Pension plan: Opt-in to employer contributed pension plan, operated with Degura
About the company
Tripledot Studios is one of the largest independent mobile games companies in the world.
We are a multi-award-winning organisation, with a global 2,500+ strong team across 12 studios.
Our expanded portfolio includes some of the biggest titles in mobile gaming, collectively reaching top chart positions around the world and engaging over 25 million daily active users.
Tripledot's guiding principle is that when people love what they do, what they do will be loved by others.
We're building a company we're proud of. One filled with driven, incredibly smart and detail-orientated people, who LOVE making games.
Our ambition is to be the most successful games company in the world, and we're just getting started., You will build and maintain the high-performance infrastructure required to train and deploy AI models that impact millions of players in real-time, as well as improve productivity of everyone at Tripledot. You will serve as a bridge between the technical solutions that are created by our teams and our live game engines, as well as creating and maintaining infrastructure for internal IT needs.
Within the group AI functions you'll be working with other AI / ML engineers, data engineers, analysts and product owners. Within the various studios and other central teams, you will interact with data, engineering, and product teams.
Within group IT, you'll partner with Engineering, TechOps, and Security to deliver the infrastructure and tooling that powers our central business functions: gaming, finance, marketing, legal, people ops, and beyond.
The first initiative you'll be taking part of is the expansion of a data/ML Platform to support ML engineers and data scientists to easily deploy their solutions, and enabling delivery of key projects like LTV and Ads Optimization.