Senior AI Engineer - APM Integrations

Datadog
Municipality of Madrid, Spain
14 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Municipality of Madrid, Spain

Tech stack

Java
.NET
Artificial Intelligence
Software as a Service
Static Program Analysis
Compilers
Profiling
Programming Tools
Distributed Systems
Systems Integration
Datadog
Large Language Models
Concurrency
Indexer
Backend
Microservices

Job description

IDM is the team that works on Datadog APM integrations: the parts that connect Datadog to the tools and services customers use. The team's job is to make it easier for engineers to build and maintain integrations over time, and to keep quality and reliability high. You will build AI-assisted tools that help with day-to-day engineering work, like drafting code changes, suggesting fixes, and checking results with tests and other automated checks. The goal is building tools that engineers can trust in real production work. You'll also work closely with other teams, understand how they work, and build solutions that fit their process. You will define what "good" means for these tools, measure results, and improve them over time. You'll set up evaluation and testing so the AI output stays correct and doesn't silently get worse. This is a senior role with high ownership, from early prototypes to production and ongoing support.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

What You'll Do:

  • Build agent workflows that take an integration need from plan to implementation and validation with humans approving at the right checkpoints.
  • Create systems that synthesize context from codebases, docs, specs, telemetry, and historical incidents to make changes that match Datadog conventions and customer expectations.
  • Generate and evolve integration code and tests, including end-to-end scenarios that reflect real customer workloads and product features.
  • Design evaluation harnesses that prevent silent regressions: golden sets, scenario baselines, semantic checks, performance thresholds, and release gating.
  • Build portfolio-level automation: proactive updates for upstream breaking changes, tracer feature rollouts across the catalog, migrations to new schemas/semantics, and targeted coverage expansion.
  • Partner tightly with PM, support engineers, and integration-owning teams to make the system adoptable, trustworthy, and embedded in daily engineering workflows.

Requirements

Do you have experience in SaaS?, Do you have a Master's degree?, Product-minded engineer who ships AI to production

  • 6+ years building backend systems (Go, Java, or .NET) with strong focus on simplicity, correctness, and performance.
  • Proven experience delivering LLM/agent features to production (prompting, tooling, evals, safety/guardrails).
  • Comfortable navigating ambiguity, iterating from prototype to production, and measuring impact with clear metrics.

Strong ML fundamentals

  • Solid grasp of the ML lifecycle (task definition, dataset construction, modeling, evaluation, deployment, iteration) and statistics for experiments.
  • Fluency with offline/online evals: golden sets, automated regressions, and evaluation harnesses that prevent silent quality drift.

Distributed systems & observability savvy

  • Experience with microservices performance: tracing, latency breakdowns, concurrency, resiliency patterns.
  • Production operations mindset: monitoring, alerting, and participating in on-call rotations where applicable.

Bonus Point:

  • Hands-on with distributed tracing stacks (OpenTelemetry/Datadog APM), profilers, and logs/metrics pipelines.
  • Experience with planning/agent frameworks, tool-use orchestration, RAG, and retrieval/indexing over large context.
  • Experience building developer tools (IDEs, static analysis, compilers, code transformation pipelines) or code-generation systems.
  • Familiarity with semantic conventions / schema migrations, scenario-based testing, and cross-language porting at scale.

Benefits & conditions

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our Internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits

Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

Apply for this position