Software Engineer - Analytics Data Platform Lakehouse

Datadog
New York, United States of America
10 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 300K

Job location

New York, United States of America

Tech stack

Query Performance
Java
Artificial Intelligence
Apache HTTP Server
Information Engineering
Data Infrastructure
Distributed Data Store
Python
Open Source Technology
Simple Data Format
Parquet
Datadog
Cloud Platform System
Spark
Kubernetes
Information Technology

Job description

Analytics Data Platform Lakehouse team builds and operates the foundations that power data engineers, applied AI, and product teams—managing millions of tables on their behalf and simplifying operations from maintenance and observability to governance, for both internal and customer-facing use cases. If you're excited by the intersection of petabyte data processing scale, open-source query engines, and building platforms with real product stakes, this is the team for you.

At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

What You’ll Do:

  • Design, build, and operate core components of our lakehouse platform, including Apache Iceberg table management (data compaction, data layout optimization, materialized view scheduling…) and Iceberg catalog
  • Drive adoption of open table formats across internal teams, owning the integration of Trino, Spark and other query engines (DuckDB, Puppygraph…) with our Iceberg-based at petabyte scale
  • Build observability for managed iceberg tables, to identify query performance bottlenecks, cost drivers and contribute fixes back to upstream open-source projects (Iceberg, Trino, Spark, Open Lineage) where relevant
  • Build self-serve tooling and abstractions that allow data engineering teams to reliably run thousands of pipelines per day against our lakehouse
  • Collaborate with data engineers, analysts, and infrastructure teams to define the roadmap for our lakehouse architecture and shape how Datadog manages analytic data at scale

Requirements

  • You have a BS/MS/PhD in Computer Science, Engineering, or a related field, or equivalent professional experience
  • You have deep, production-grade experience with one or more of Apache Iceberg, Trino, or Apache Spark, ideally demonstrated through significant open-source contributions: merged PRs, committer status, or PMC membership on projects
  • You have built or operated large-scale distributed data systems
  • You have a solid grasp of query planning, columnar file formats (Parquet, ORC), and table format internals (snapshots, manifests, partition evolution)
  • You are fluent in Java, Scala or Go and comfortable with Python for pipeline tooling
  • You have experience deploying and running data infrastructure on Kubernetes in cloud environments

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply.

Benefits & conditions

Datadog offers a competitive salary and equity package, and may include variable compensation. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience. In addition, Datadog offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental health benefits, a 401(k) plan and match, paid time off, fitness reimbursements, and a discounted employee stock purchase plan. The reasonably estimated yearly salary for this role at Datadog is: $130,000-$300,000 USD

Apply for this position