SRE H/F

Licorne Society
Paris, France
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote
Paris, France

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Systems Engineering
Unix
Software as a Service
Continuous Integration
DevOps
Reliability Engineering
Security Information and Event Management
Datadog
Data Logging
System Availability
Grafana
Mttr
Kubernetes
Terraform
New Relic (SaaS)

Job description

We are excited to open a new position for a Site Reliability Engineer to join and strengthen our Engineering team. You will work closely with our current SRE to ensure the reliability, performance, and scalability of our infrastructure, which supports critical financial services for our clients.

Our platform runs entirely on AWS and Kubernetes, managed with Infrastructure as Code using Terraform. Datadog is at the core of our observability stack, enabling us to monitor, detect, and respond to issues quickly to maintain high levels of reliability and performance.

You will help drive operational excellence, optimize infrastructure costs, and enhance the developer experience through improved CI/CD practices, automation, and observability. While infrastructure is the core focus of this role, you will also contribute to our security and compliance efforts (SOC 2, ISO 27001), helping ensure our platform remains trustworthy and secure., * Manage and evolve AWS infrastructure and Kubernetes clusters to ensure high availability, robust performance, and cost efficiency.

  • Support the deployment and operation of AI workloads and models, adapting infrastructure and automation to meet their requirements.
  • Leverage Terraform and DevOps best practices to automate and streamline infrastructure deployment and configuration.
  • Continuously improve infrastructure testing methods and proactively resolve performance bottlenecks or scalability issues.

Observability and Incident Management

  • Enhance Datadog-based monitoring to proactively detect and alert on issues, focusing on symptom-based alerting to avoid service disruptions.
  • Lead incident response efforts, reducing Mean Time To Detection (MTTD) and Mean Time To Resolution (MTTR).
  • Implement robust logging, tracing, and metrics to enable quick issue diagnosis and resolution.

Security and Compliance

  • Support ongoing compliance efforts with SOC 2 and ISO 27001, integrating security best practices into operations.
  • Manage and use tools such as AWS Security Hub, GuardDuty, and Datadog SIEM to identify risks, respond to incidents, and strengthen overall security.
  • Participate in security assessments and audits, recommending and implementing improvements.

Developer Experience & Empowerment

  • Refine CI/CD pipelines to enable safe, fast, and secure deployments.
  • Provide tooling, automation, and clear documentation to support developer productivity and satisfaction.
  • Maintain and optimize development, staging, and sandbox environments for smooth workflows.

What's in It for You

  • A collaborative, flat-structured environment where all voices are valued
  • Opportunities for career growth in a scaling company
  • Flexible remote work policy
  • A team of experienced engineers to learn and grow with
  • A culturally diverse and inclusive workplace

Requirements

Do you have experience in UNIX?, * 4+ years of experience in SRE, DevOps, or System Engineering

  • Proven expertise with AWS, Kubernetes, and Terraform
  • Experience deploying and operating SaaS solutions
  • Strong knowledge of high-scalability architectures
  • Comfortable working with Linux/Unix shell
  • Practical experience with containerized architecture
  • Familiarity with monitoring tools (e.g., New Relic, Grafana, or similar)
  • Fluent in English
  • Strong problem-solving and analytical mindset

Apply for this position