Principal Data Engineer - AI
Role details
Job location
Tech stack
Job description
We're seeking a Principal Data Engineer who can work across the full stack of Anaplan's data platform, setting the technical direction for how we ingest, transform, store, serve, and govern data at scale. You will build highly performant, robust data pipelines that process massive volumes of data in real-time and batch. This foundational work empowers business users to leverage vast datasets in their planning workflows and forms the bedrock for our advanced analytics and AI initiatives. You'll need deep knowledge of distributed computing, data architecture, and strong software engineering skills to tackle complex, high-scale data challenges. This role is open to candidates located in the Eastern or Central time zones. Employees who live within commuting distance of one of our offices will be expected to work onsite two days per week as part of our hybrid work model Your Impact
- Lead the data architecture, design, and deployment of scalable, high-throughput Big Data systems into production environments.
- Architect, deploy, and manage the foundational data systems that underlie modern AI infrastructure, including vector, NoSQL, and document databases.
- Develop end-to-end data engineering solutions, including robust ETL/ELT pipelines, API services, and data ingestion frameworks.
- Design and build the storage and processing layers powering our analytics workloads: data lakes, data warehouses, distributed file systems, and real-time streaming architectures.
- Engineer feature-rich context pipelines that process large-scale enterprise data, balancing batch and streaming patterns seamlessly.
- Optimize and scale large distributed queries and data transformations to ensure high performance and low latency for end users.
- Implement data quality frameworks to measure and ensure data integrity, reliability, and governance across all data assets.
- Collaborate with analytics, product, and platform teams to build data models that capture the semantics of customer metrics, hierarchies, and relationships.
- Stay current with the modern data stack and big data landscape, evaluating new tools, distributed computing frameworks, and database technologies for potential adoption.
Requirements
- Extensive data engineering experience, demonstrating a strong track record of hands-on execution and delivery in complex data environments.
- Deep practical understanding of the database ecosystems that power AI and machine learning infrastructure (e.g., Vector databases, NoSQL, and Document stores).
- Hands-on experience building, scaling, and shipping large-scale data platforms in production.
- Deep practical experience with distributed data processing frameworks (e.g., Apache Spark, Flink, Hadoop).
- Strong expertise in message brokers and event streaming platforms (e.g., Apache Kafka, Kinesis).
- End-to-end exposure to data pipeline lifecycle development, including extensive experience with workflow orchestration tools (e.g., Apache Airflow, Dagster).
- Hands-on expertise with cloud data warehouses (e.g., Snowflake, BigQuery, Redshift) and data lake architectures (e.g., Databricks, Delta Lake, Apache Iceberg).
- Advanced SQL skills and proficiency in Python.
- Strong background in modern software development practices (testing, code review, CI/CD, Infrastructure as Code).
Desirable
- Extensive, progressive experience leading technical projects and mentoring engineering teams.
- Hands-on experience with cloud-native infrastructure (AWS, GCP, or Azure).
- Experience implementing data observability, monitoring, and alerting frameworks at scale.
- Familiarity with Anaplan or similar enterprise planning platforms.