Senior Platform Engineer - AI Infrastructure & Observability

acto GmbH

18 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Tech stack

Java

API

Artificial Intelligence

Amazon Web Services (AWS)

Google BigQuery

Cloud Computing

ETL

Software Debugging

Django

Python

Log Analysis

Software Engineering

Data Logging

Cloud Platform System

Amazon Web Services (AWS)

Large Language Models

Spring-boot

Backend

FastAPI

Kotlin

Kubernetes

Terraform

Data Pipelines

Amazon Web Services (AWS)

Job description

We're looking for a Senior Platform Engineer (m/f/d) excited about building infrastructure for AI-first applications. You'll own our cloud platform - from Kubernetes clusters running real-time voice agents to ClickHouse analytics pipelines processing millions of events daily. You'll tackle novel observability challenges such as monitoring ClickHouse cluster health, ensuring sub-200ms latency for voice AI, and tracking data pipeline quality. We're AI-first not just in what we build, but in how we operate - leveraging AI-native tools for incident response and building automation that uses LLMs to accelerate debugging and root cause analysis.

Your mission

Automate Incident Management: Implement AI-native incident management tools to accelerate response and automate root cause analysis.
Manage Cloud Infrastructure: Operate and optimize AWS EKS infrastructure with Terraform, tailored for AI workloads and analytics pipelines.
Ensure Data Reliability: Maintain ETL workflows, ClickHouse cluster health, and batch jobs, ensuring data freshness and quality.
Optimize System Performance: Design API failover strategies, implement caching layers, and continuously optimize infrastructure.
Improve Developer Experience: Maintain Skaffold-based local development environments, enhance CI/CD pipelines, and build internal productivity tooling.
Enhance Observability: Implement and monitor SLOs, use AI tools for log analysis, and improve visibility through structured logging.

Requirements

Do you have experience in gRPC?, * Infrastructure Expertise: 5+ years of software engineering experience and 3+ years running Kubernetes in production (AWS EKS preferred).

IaC Mastery: Strong Terraform and GitOps workflows experience, with deep AWS knowledge (VPCs, RDS, ElastiCache, Lambda).
Data & AI Focus: Experience monitoring ETL pipelines and analytics workloads (ClickHouse, Redshift, BigQuery), and excitement for AI-native operations tools such as log analysis or automated remediation.
Backend & Leadership: Proficient in Python or Kotlin/Java, familiar with FastAPI, Spring Boot, Django, or gRPC. Able to work independently, mentor others, and drive technical decisions.
Mindset: Strong written communication skills, comfort with ambiguity, and motivation to build at the intersection of AI and infrastructure.

About the company

* Development Opportunities: Steep development opportunities without entrenched hierarchies. * Culture: We are an ambitious team that wants to achieve a lot with Acto, but we don't take ourselves too seriously and like to have fun together, be it working together or at regular team events. * Flexibility: Enjoy a flexible work schedule and a remote set up with the option to work from any Adesso offices around Germany or our Munich Office. * Health & work-life balance: Enjoy 30 days of vacation and an attractive Wellhub membership. * Ownership: Whether you are an intern or a full-time acto-naut, Acto instills a deep sense of ownership, where every team member is entrusted with meaningful responsibilities and the autonomy to make impactful decisions. * Equipment: You will receive a state-of-the-art hardware setup with a choice between Mac and Windows., At Acto, we believe that the future does not belong to the companies that have the most data, but to those that can use their potential most effectively. That's why we develop software that builds on existing ERP systems and transforms data into prioritized and AI-supported recommendations for action. We help B2B sales teams in the logistics, wholesale and B2B consumer goods sectors to escape the data chaos and use AI to make better decisions faster, thereby increasing productivity. We started in 2021 and since then have not only been able to win well-known medium-sized companies as customers, but also bring well-known investors such as 468 Capital and Cusp Capital on board. If you require alternative methods of application or screening, you must approach the employer directly to request this as Indeed is not responsible for the employer's application process.