Site Reliability Engineer

Deepslate
Berlin, Germany
3 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English, German
Compensation
€ 70K

Job location

Remote
Berlin, Germany

Tech stack

API
Artificial Intelligence
JIRA
Cloud Computing
Continuous Integration
Distributed Systems
Fault Tolerance
Reliability Engineering
WebSocket
WebRTC
Datadog
Pulumi
System Availability
Integration Tests
Kubernetes
Live Streaming
Terraform
Pagerduty
Network Optimization
Microservices

Job description

Your mission is to build an infrastructure so resilient that potential outages are caught and mitigated before they even happen. You are the bridge between development and operations, ensuring that our massive AI workloads run smoothly, efficiently, and with uncompromising high availability., * Infrastructure as Code: Design, build, and manage our cloud infrastructure using modern tools (Pulumi) to ensure all infrastructure changes are reproducible, secure, and easily auditable.

  • Kubernetes Mastery: Orchestrate and optimize our Kubernetes clusters for complex, compute-heavy AI workloads, guaranteeing maximum efficiency and fault tolerance.
  • Deep Observability & Monitoring: Implement a flawless monitoring setup. Using Datadog and OpenTelemetry, you will make the black box of our distributed systems transparent, hunting down latency spikes or bottlenecks before they impact users.
  • Incident Response & Reliability: Establish and manage our on-call and alerting processes (using PagerDuty) and champion a culture of blameless post-mortems so the same mistake never happens twice.
  • Release Confidence: Build and maintain highly automated Integration Testing and deployment pipelines. No code goes live without rigorous validation of its impact on system stability., * SLAs, SLOs & SLIs: Define and monitor our service-level metrics, turning reliability into a measurable, core component of our product development cycle.
  • Automation First: Ruthlessly automate away toil (repetitive, manual work) so the engineering team can focus on innovation instead of maintenance.
  • Security & Compliance: Ensure our infrastructure is not only highly available but also locked down and hardened against external threats.

Requirements

Do you have experience in Terraform?, Must-Haves:

  • Kubernetes Expertise: Deep, hands-on experience in setting up, managing, and scaling self-hosted Kubernetes clusters in production.
  • Infrastructure as Code: Strong experience with modern IaC, ideally with Pulumi (or deep Terraform knowledge alongside a willingness to adopt Pulumi).
  • Observability Champion: You are a pro with Datadog and OpenTelemetry. You know exactly how to effectively monitor distributed systems across tracing, metrics, and logs.
  • Alerting & Incident Management: Proven experience with PagerDuty (or similar tools) and a track record of building a healthy, sustainable on-call culture.
  • Integration Testing & CI/CD: Hands-on experience setting up robust testing and deployment pipelines for complex, microservice-based architectures.
  • Fluent German skills: (spoken and written).
  • Startup Mindset: You are comfortable navigating the "glorious chaos" of an early-stage codebase. If a process is missing or unstructured, you roll up your sleeves and build it.
  • Extreme Ownership: You aren't just looking to blindly process Jira tickets. You proactively identify where the infrastructure is burning-or where it will burn in the future-and you take action.

Nice-to-Haves:

  • Experience managing GPU workloads and scaling AI/ML infrastructures.
  • Background in network optimization (highly critical for latency-sensitive voice streaming protocols like WebRTC or WebSockets).
  • Previous experience building high-availability systems in a fast-paced B2B/API-first environment.

About the company

At Deepslate, we are building Speech to Speech Voice AI models that sound and act indistinguishable from a human. And we believe everyone should be able to use it. When it comes to text and images, giants like OpenAI and Google have already cracked the code. With video, Veo3, Sora and others are closing the gap rapidly. But with its endless languages, dialects, accents, subtle intonations, and speech melodies voice remains a highly complex unsolved frontier. That is exactly why we started Deepslate. Backed by top-tier investors from the Tech and AI sectors, as well as a major German VC fund, we are incredibly well-funded and moving fast. We are building the future of communication. We aren't trying to build another standalone platform; instead, we are the intelligence engine powering countless other applications. Whether it's integrated as a module by a CRM provider, plugged into another Voice AI platform, or directly embedded into an enterprise system by our integration partners - our model is everywhere.

Apply for this position