Platform Engineer (Cloud Infrastructure, AI Platform)

Allianz SE

München, Germany

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

München, Germany

Tech stack

Kubernetes Security

Artificial Intelligence

Airflow

Application Performance Management

Automation of Tests

Azure

Cloud Computing

Computer Networks

Continuous Integration

Data Infrastructure

DevOps

Programming Tools

Github

Python

PostgreSQL

Linux System Administration

Microsoft Office

Networking Basics

Performance Tuning

Queueing Systems

Role-Based Access Control

Redis

Cloud Services

Prometheus

Workflow Management Systems

Data Logging

Scripting (Bash/Python/Go/Ruby)

Cloud Platform System

Cloud Monitoring

Autoscaling

Istio

Grafana

Backend

Kubernetes

Bicep

Machine Learning Operations

Celery

Virtual Agents

Terraform

Dynatrace

Azure

Databricks

Job description

Home-Office Apache Airflow ArgoCD Artificial Intelligence (AI) Backend Entwicklung Backup / Recovery CI/CD (Continuous Integration/Delivery)

+19 Top, As a Platform Engineer within Advanced Analytics (DA3) in the Chief Data & AI Office area at Allianz Partners, you will join our central AI team to build and operate the cloud infrastructure that powers AI-enabled solutions at global scale., We are looking for an engineer with deep Kubernetes and cloud expertise to implement, automate, and maintain the platform foundations that enable teams to deploy and operate AI services reliably.

You will work in a cross-functional environment with Backend Engineers, ML Engineers, AI Architects, and Platform Architects, taking hands-on ownership of the infrastructure layer, from Kubernetes clusters and CI/CD pipelines to observability systems and security controls.

In this role, you will translate platform architecture into working infrastructure, reduce operational toil through automation, and ensure production systems meet reliability and security standards.

Through this role, you will have the main following responsibilities:

Implement and operate Kubernetes infrastructure (AKS): cluster lifecycle, networking, resource management, auto-scaling, and multi-tenancy patterns.
Build and maintain CI/CD pipelines using GitHub Actions and ArgoCD for automated testing, container builds, and GitOps deployments.
Develop Infrastructure as Code (Terraform, Bicep) to provision and manage Azure resources with consistency and auditability.
Operate container registries (ACR), artifact management, and image security scanning workflows.
Implement and maintain observability infrastructure: Azure Monitor, Application Insights, Prometheus, Grafana-including dashboards, alerting, and distributed tracing.
Manage async processing infrastructure: Celery workers, Redis queues, and workflow orchestration patterns supporting AI agent execution.
Implement platform security controls: network policies, pod security standards, Key Vault integration, RBAC, and private endpoint configurations.
Support database infrastructure: PostgreSQL management, backup/recovery, connection pooling, and performance tuning.
Create self-service tooling and templates that enable development teams to deploy and operate services with minimal friction.
Diagnose and resolve infrastructure issues across clusters, pipelines, and cloud services; perform root-cause analysis and implement preventative improvements.
Collaborate with Platform Architects, Backend Engineers, and ML Engineers to translate architecture designs into reliable infrastructure., Our employees play an integral part in our success as a business. We appreciate that each of our employees are unique and have unique needs, ambitions and we enjoy being a part of their journey. We are there to empower and encourage you with your personal and professional development ensuring that you take control by offering a large variety of courses and targeted development programs.

Requirements

5+ years professional experience in platform engineering, SRE, or DevOps roles; experience supporting AI/ML workloads is a strong plus.
Strong Kubernetes experience: cluster operations, networking (Ingress, network policies), storage, autoscaling, and troubleshooting.
Solid Infrastructure as Code experience with Terraform, Bicep, or equivalent tools.
Production experience with Azure cloud services: AKS, ACR, Key Vault, Azure Monitor, Virtual Networks, Private Endpoints, and Azure Policy.
Strong CI/CD experience: GitHub Actions (self-hosted runners, reusable workflows), ArgoCD, or similar GitOps tooling.
Proficiency in Python for automation, scripting, and tooling.
Experience with container security: image scanning, runtime security, network policies, and least-privilege patterns.
Experience with observability stack: Prometheus, Grafana, centralized logging, and alerting configuration.
Familiarity with async task processing: Celery, Redis, or equivalent message queue patterns.
Strong Linux systems administration and networking fundamentals.
Operational mindset with strong troubleshooting skills across infrastructure layers.

Ways of Working

Comfortable in agile, iterative delivery environments with ownership and accountability.
Clear communicator and collaborator across global, cross-functional stakeholders.
Strong focus on reliability and automation: you measure success by system uptime and reduced manual toil.
Proactive learner with pragmatic adoption of AI-assisted developer tools (GitHub Copilot, Claude Code) to improve automation and delivery.

Nice to Have

Experience supporting AI/ML infrastructure: GPU scheduling, model serving platforms, or ML pipeline orchestration.
Service mesh experience (Istio, Linkerd) for traffic management and security.
Experience with Databricks or similar data platform infrastructure.
Familiarity with workflow orchestration (Temporal, Airflow) for complex AI pipelines.
Experience with cost optimization: FinOps practices, resource right-sizing, and reserved capacity planning.
Experience in regulated environments where auditability and secure-by-default infrastructure are essential.
Certifications: CKA/CKAD, Azure Administrator, or Terraform Associate.

About the company

All that in a global environment where international mobility and career progression are encouraged. Caring for your health and wellbeing is key priority for us. This is why we build Work Well programs to providing you with peace of mind and give the flexibility in planning and arranging for a better work-life balance. 90206 | Data & AI | Professional | Allianz Partners | Full-Time | Permanent Allianz Group is one of the most trusted insurance and asset management companies in the world. Caring for our employees, their ambitions, dreams and challenges, is what makes us a unique employer. Together we can build an environment where everyone feels empowered and has the confidence to explore, to grow and to shape a better future for our customers and the world around us. We at Allianz believe in a diverse and inclusive workforce and are proud to be an equal opportunity employer. We encourage you to bring your whole self to work, no matter where you are from, what you look like, who you love or what you believe in.We therefore welcome applications regardless of ethnicity or cultural background, age, gender, nationality, religion, disability or sexual orientation. Great to have you on board. Let's care for tomorrow. Note: Diversity of minds is an integral part of Allianz' company culture. One means to achieve diverse teams is a regular rotation of Allianz Executive employees across functions, Allianz entities and geographies. Therefore, the company encourages its employees to have motivation in gaining varied skills from different positions and to collect experiences from across Allianz Group. Das sagen die Mitarbeitenden * Allianz in Deutschland Timo Müller IT-Traineeprogramm Teilnehmer Die Allianz in drei Worten: "Sicherheit - Vertrauen - Engagement.