Staff Software Engineer - DataPlatform
DUETTO
San Francisco, United States of America
18 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Remote
San Francisco, United States of America
Tech stack
Java
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Apache HTTP Server
Google BigQuery
Data as a Services
Information Engineering
Data Governance
Data Transformation
Data Warehousing
Java Virtual Machine (JVM)
Python
Operational Databases
SQL Databases
Data Streaming
Datadog
Large Language Models
Snowflake
Backend
Event Driven Architecture
Data Lake
PySpark
Amazon Web Services (AWS)
Kafka
Codebase
Presto
Amazon Web Services (AWS)
Terraform
Redshift
Job description
- AI-first is the reality, not the roadmap. Every engineer uses Claude Code daily, and you'll be contributing to a custom multi-agent system with 17 specialised agents, human-in-the-loop approval gates, and AI-assisted pipeline generation. This is what modern data engineering looks like.
- Cross-cutting technical ownership. This isn't a siloed data role - you'll operate across Java upstream systems and Python pipelines, with real architectural influence over how data flows through the entire platform.
- The scale and the stakes are real. Millions of pricing decisions processed daily, 80+ integration partners, and a live evolution toward near-real-time streaming. The technical challenges are genuine.
- A team with the right values. Low ego, high EQ, intellectual curiosity, and active mentorship in both directions across a collaborative US and Europe group.
- Modern stack at real scale. Python/PySpark, Apache Iceberg, Airflow, AWS Glue, Terraform, Datadog - tools that matter, used on problems that matter.
Requirements
- 7+ years building production data or backend systems
- Strong proficiency in both Python (PySpark, data engineering) and Java - you can read and navigate Spring/JVM codebases comfortably, not just Python
- Experience with lakehouse or data warehouse architectures on cloud - Iceberg, Delta Lake, Redshift, BigQuery, or Snowflake
- Production experience with AWS data services: Glue, Athena, S3, Lambda
- Experience with workflow orchestration - Airflow, Kestra, Step Functions, or similar
- The ability to work across system boundaries: you understand upstream event schemas, data models, and downstream consumer needs, * Hands-on experience with Apache Iceberg - MERGE operations, schema evolution, partition evolution
- Experience with Trino or Presto for federated or interactive SQL analytics at scale
- Experience with dbt for data transformation, modelling, and testing
- Familiarity with data quality frameworks - Great Expectations, Monte Carlo, or similar
- A background in event-driven architectures: Kinesis, Kafka, SQS
- Genuine interest in AI-assisted development and LLM-based engineering workflows
- Familiarity with the hospitality domain or multi-tenant B2B SaaS data challenges, You don't need to be an expert in every technology on this list. If you're a strong engineer who spans Java and Python, cares about data quality, and wants to work somewhere AI is genuinely part of how you build - we'd love to hear from you.
About the company
This is a rare role for an engineer who's genuinely comfortable on both sides of the stack - reading Java Spring Boot services in the morning and writing PySpark pipelines in the afternoon. You'll own the end-to-end data flow that sits behind every pricing decision Duetto makes for thousands of hotels worldwide, bridging our Java core platform and Python data layer while working in an engineering culture where AI is already how the work gets done.
What Makes Us Different?
Duetto is the hospitality industry's leading revenue management platform, founded in 2012 by former Wynn Resorts executives who knew the industry needed better technology. We built the world's first Revenue & Profit Operating System - a suite of tools (GameChanger, ScoreBoard, BlockBuster, Advance and more) that goes beyond room pricing to give hotels, resorts and casinos a complete picture of their revenue and profitability. Trusted by clients ranging from independent boutique hotels to global chains, we've been named the #1 Revenue Management Software by HotelTechAwards four years running and the #1 Best Place to Work in Hotel Tech in 2025. Backed by GrowthCurve Capital since 2024, we're accelerating our investment in AI - and we're genuinely passionate about the industry we serve. We build products we're proud of, for customers we care about.
What You'll Be Doing
* You'll own data pipelines from source to gold - taking data from MongoDB, Kinesis events, RabbitMQ, and PMS/CRS integrations through the full bronze * silver * gold lakehouse architecture, including Iceberg-based ingestion patterns for both batch and near-real-time workloads.
* You'll build connectors and transformations for new data sources - Salesforce, Amadeus, Expedia, and internal platform events - expanding the reach and reliability of the data layer the whole product depends on.
* You'll work across system boundaries: reading Kinesis event schemas and MongoDB data models in the Java platform, then applying that understanding in the Python pipeline codebase - driving schema evolution strategies, data contract enforcement, and backward compatibility across both worlds.
* You'll drive data quality and governance - extending Great Expectations and Data Contract CLI across pipeline tiers, owning Athena views and SQL assets, and building monitoring and alerting for pipeline health, data freshness, and quality drift.
* You'll partner with DevPlatform on event schema design and with data science to operationalise ML workloads, ensuring clean data handoffs across every system boundary.
* You'll work AI-first every day - using Claude Code and MCP tools as a core part of your workflow, contributing to AI-assisted pipeline scaffolding and data discovery tooling alongside a custom multi-agent system built around 17 specialised agents., About Duetto
51-200