Senior AI Engineer - APM Features

Datadog
Paris, France
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Paris, France

Tech stack

A/B testing
Artificial Intelligence
Profiling
Cursor (Graphical User Interface Elements)
Distributed Systems
Datadog
Large Language Models
Build Management
Microservices

Job description

  • Design and build AI-powered troubleshooting features for APM workflows using LLMs and agentic systems.
  • Help users diagnose and resolve performance issues by synthesizing large volumes of observability data, including traces, metrics, and logs.
  • Prototype, experiment, and iterate on AI-driven experiences, using evidence and user feedback to guide decisions and focus on real user value.
  • Define inputs, outputs, and success criteria for LLM-based systems operating in evolving and sometimes ambiguous environments.
  • Build agentic workflows with strong guardrails, balancing autonomy, safety, correctness, and reliability.
  • Lead features end-to-end in collaboration with peers and partners, from problem discovery through production and iteration.
  • Design and maintain evaluation loops, including offline evaluations, benchmarks, and A/B tests.
  • Write and own production backend services, contributing to reliable, scalable systems.

Requirements

  • A senior, product-minded engineer with experience shipping AI systems to production.
  • Comfortable working in evolving problem spaces and proactively identifying meaningful opportunities to build.
  • Hands-on experience with LLMs or agentic systems, including prompting, tooling, evaluation, and guardrails.
  • Experience using AI coding tools such as Cursor, Claude Code, or similar, with the ability to reflect on what worked, what didn’t, and why.
  • A strong sense for correctness, failure modes, and how to measure and improve quality in AI systems.
  • Comfortable experimenting, learning from outcomes, and iterating thoughtfully.
  • Solid ML and applied science fundamentals, including experiment design and statistics.

Bonus points:

These are helpful but not required - we don’t expect candidates to have experience with everything listed below.

  • Exposure to agent frameworks, tool-use orchestration, retrieval-augmented generation (RAG), and indexing large-scale telemetry data.
  • Familiarity with SLO/SLA practices and incident response.
  • Hands-on experience with distributed tracing systems (OpenTelemetry, Datadog APM), profilers, or logs and metrics pipelines.

Distributed systems fundamentals and familiarity with observability concepts.

Benefits & conditions

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our Internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

Apply for this position