Senior DevOps / Infrastructure Engineer

Causa Prima

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Tech stack

API

Cloud Computing

Code Review

Databases

Continuous Integration

Data Stores

DevOps

Fault Tolerance

Github

Graph Database

Identity and Access Management

Python

Network Security

Neo4j

OAuth

Ansible

TypeScript

Data Logging

Pulumi

Cloud Platform System

Cloud Monitoring

Large Language Models

Amazon Web Services (AWS)

Kubernetes

Sentry

Kafka

Event Store

Terraform

Docker

Pagerduty

Job description

CI/CD - GitHub Actions + Cloud Build, security-aware pipeline design, production approval gates, container image scanning, secret isolation, signed commits.
Observability - OpenTelemetry distributed tracing across TypeScript and Python services, Cloud Monitoring, Sentry with PII-stripping hooks, structured logging with sanitization, per-agent behavioural monitoring, tiered alerting.
Secret management & rotation - Credential lifecycle for LLM API keys, database credentials, OAuth tokens, and agent signing keys in GCP Secret Manager.
Container orchestration - Docker builds, registry management, GKE cluster configuration. Design the path toward Kubernetes-native deployment as we scale.
Incident response infrastructure - Per-agent circuit breakers, graceful degradation, tiered alerting (logged Slack PagerDuty), forensic tooling via event store replay and traces.
Network security - VPC firewall rules, private ingress for all data stores, egress controls, PII Vault on restricted-access infrastructure.
Neo4j Aura operations - Monitoring, scaling decisions, and backup verification for the managed graph database.

Requirements

Do you have experience in Terraform?, Do you have a Master's degree?, * 5+ years in DevOps, infrastructure, or SRE roles for production systems.

Strong systems design skills - you think in deployment topologies, failure domains, blast radius, and operational security.
Production experience with GCP (Cloud Run, GKE, Cloud SQL, IAM, Secret Manager) or equivalent cloud platform with willingness to go deep on GCP.
Hands-on experience with Kubernetes in production - cluster management, networking, scaling, security policies.
Experience with infrastructure-as-code: Terraform, Pulumi, Ansible, or similar. Ideally more than one.
Experience designing CI/CD pipelines with security in mind - secret isolation, approval gates, image scanning, deployment strategies.
Experience with observability systems - distributed tracing, structured logging, alerting hierarchies, dashboarding.
Security awareness at the infrastructure level - you think about network isolation, least-privilege IAM, and credential hygiene as defaults.
Strong code review skills for infrastructure-as-code and deployment configuration.
Nice to have: