SRE H/F

Licorne Society

Paris, France

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Paris, France

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Systems Engineering

Unix

Software as a Service

Continuous Integration

DevOps

Reliability Engineering

Security Information and Event Management

Datadog

Data Logging

System Availability

Grafana

Mttr

Kubernetes

Terraform

New Relic (SaaS)

Job description

We are excited to open a new position for a Site Reliability Engineer to join and strengthen our Engineering team. You will work closely with our current SRE to ensure the reliability, performance, and scalability of our infrastructure, which supports critical financial services for our clients.

Our platform runs entirely on AWS and Kubernetes, managed with Infrastructure as Code using Terraform. Datadog is at the core of our observability stack, enabling us to monitor, detect, and respond to issues quickly to maintain high levels of reliability and performance.

You will help drive operational excellence, optimize infrastructure costs, and enhance the developer experience through improved CI/CD practices, automation, and observability. While infrastructure is the core focus of this role, you will also contribute to our security and compliance efforts (SOC 2, ISO 27001), helping ensure our platform remains trustworthy and secure., * Manage and evolve AWS infrastructure and Kubernetes clusters to ensure high availability, robust performance, and cost efficiency.

Support the deployment and operation of AI workloads and models, adapting infrastructure and automation to meet their requirements.
Leverage Terraform and DevOps best practices to automate and streamline infrastructure deployment and configuration.
Continuously improve infrastructure testing methods and proactively resolve performance bottlenecks or scalability issues.

Observability and Incident Management

Enhance Datadog-based monitoring to proactively detect and alert on issues, focusing on symptom-based alerting to avoid service disruptions.
Lead incident response efforts, reducing Mean Time To Detection (MTTD) and Mean Time To Resolution (MTTR).
Implement robust logging, tracing, and metrics to enable quick issue diagnosis and resolution.

Security and Compliance

Support ongoing compliance efforts with SOC 2 and ISO 27001, integrating security best practices into operations.
Manage and use tools such as AWS Security Hub, GuardDuty, and Datadog SIEM to identify risks, respond to incidents, and strengthen overall security.
Participate in security assessments and audits, recommending and implementing improvements.

Developer Experience & Empowerment

Refine CI/CD pipelines to enable safe, fast, and secure deployments.
Provide tooling, automation, and clear documentation to support developer productivity and satisfaction.
Maintain and optimize development, staging, and sandbox environments for smooth workflows.

What's in It for You