Private AI Runtime & Infrastructure Engineer (Local Data Residency)

Omnilex

Zürich, Switzerland

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, German

Compensation

CHF 144K

Job location

Zürich, Switzerland

Tech stack

Artificial Intelligence

Azure

Software as a Service

Databases

Information Leak Prevention

Software Debugging

Linux

DNS

Fault Tolerance

Node.js

Next.js

Data Streaming

TypeScript

Management of Software Versions

Graphics Processing Unit (GPU)

Large Language Models

Concurrency

NestJS

Job description

Own the private AI runtime that powers our product: the infrastructure, security posture, and operational reliability of running LLM + agent workloads on Swiss-based VMs (optionally with GPUs). You'll make it safe, observable, scalable, and easy for the team to ship changes without fear., * Design and operate a VM-centric architecture for agent execution and model serving (single-node vs multi-node, concurrency, streaming).

Manage model artifacts: downloads, storage, integrity checks, versioning, and safe rollback paths.
Implement pragmatic controls for runaway agents (timeouts, token limits, tool permission boundaries, sandboxing patterns).
Make capacity predictable: tokens/sec, queueing/backpressure, p95 latency targets, peak load behavior, and graceful degradation modes.

Linux / VM / network fundamentals (done right)

Establish hardened VM baselines: users/sudo, SSH posture, patching approach, sensible defaults.
Apply resource controls (cgroups/ulimits), disk + IO tuning, and repeatable "why is this VM slow?" investigation playbooks.
Own TLS decisions end-to-end (termination, cert lifecycle, internal mTLS where it matters), plus egress controls and private networking.
Build debugging muscle for real failure modes (DNS, certs, MTU, packet loss, noisy neighbors / CPU steal, memory pressure, IO wait).

Observability and incident readiness for LLM services

Instrument what actually matters for LLM workloads: tokens/sec, queue depth, context length distribution, timeouts, error classes, saturation signals.
Turn logs/metrics/traces into action: dashboards that get used, alerts that don't spam, and runbooks that work at 3am.
Drive incident hygiene: triage patterns, mitigation tools, postmortems that result in concrete fixes.

Security & privacy as engineering, not paperwork

Build a real secrets lifecycle (creation * distribution * runtime access * rotation * revocation).
Enforce least-privilege access on VM-centric infra (service identities, scoped credentials, audit trails).
Prevent sensitive data leakage via prompts/logs/traces (redaction, sampling discipline, "never log this" guardrails).
Reduce supply-chain risk: container/dependency hygiene, provenance checks for model artifacts, scanning and patch workflows.
Prepare for uncomfortable scenarios (suspicious outbound traffic, prompt injection leading to attempted exfiltration) with detection + response playbooks.

Safe delivery and change management

Implement deploy strategies appropriate for AI runtime infra: canaries, rollback triggers, maintenance windows, and fast revert paths.
Keep systems from drifting: config drift detection, baseline enforcement, and reproducible deployments (IaC/automation).
Support database/environment changes that intersect with runtime reliability (migrations, customer splits, environment promotion).

What success looks like (first months)

A clear map of the biggest reliability + security risks in the runtime, with a prioritized plan and measurable improvements.
Repeatable deployments with rollback confidence and a sane patching cadence (OS + drivers + runtime).
Dashboards/alerts that catch real incidents early (and stop waking people up for noise).
Practical privacy boundaries: everyone on the team knows what data can/cannot leave the VM, and the system enforces it.

Requirements

You're the person teams trust when the workload is messy, privacy-sensitive, and "it must not go down." You like building systems that stay boring in production: hardened VMs, predictable deploys, clean rollbacks, useful alerts, and security controls that work by default.

You can talk comfortably about trade-offs, latency vs cost, isolation vs operational complexity, GPU vs CPU, shared vs per-tenant, and you've probably earned that comfort by running something real in production and being on the hook when it breaks., * Hands-on experience running production infrastructure (VMs, networking, Linux) for a SaaS or platform product.

Strong operational skill: debugging, incident response, log/metric-based investigation, and making recurring problems disappear.
Solid security fundamentals applied pragmatically: secrets, least privilege, egress control, dependency hygiene, auditability.
Comfort automating everything that should not require humans (provisioning, deploys, checks, drift detection, runbooks).
Clear communication and an ownership mindset, you can partner with product/dev without becoming a blocker.
Proficiency in English.
Full-time availability; Zurich-based with at least two on-site days per week (hybrid)., * Experience operating LLM serving/agent systems (or similarly spiky, latency-sensitive workloads).
GPU operations familiarity (VRAM sizing intuition, quantization trade-offs, fallback modes).
Azure experience (identity, networking, observability, IaC).
Familiarity with our ecosystem: TypeScript, Node.js, NestJS, Next.js.
Exposure to ISO 27001-style environments or supporting security audits.
Swiss work permit or EU/EFTA citizenship.
Working proficiency in German.

Benefits & conditions

High-impact ownership over the most sensitive part of the stack: private AI runtime + reliability + security.
A sharp interdisciplinary team working at the intersection of AI and law.
Autonomy: you define the guardrails that let everyone ship faster with fewer surprises.
Compensation: CHF 8'000-12'000 per month + ESOP, depending on experience and skills.

About the company

Omnilex is a fast-moving AI legal tech startup born out of ETH Zurich. Our interdisciplinary team of 14+ people builds an AI product that helps lawyers and in-house legal teams research faster and answer complex legal questions with confidence. We combine external/public sources, customer-internal knowledge, and our own AI-first legal commentaries to tackle real-world legal complexity, often under strict data residency and privacy expectations.