Lead Data Engineer

Toyota Motor North America
Plano, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Plano, United States of America

Tech stack

Training Data
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Apache HTTP Server
Application Services
Code Review
Information Systems
Databases
Directed Acyclic Graph (Directed Graphs)
Data Architecture
Data Validation
Information Engineering
Data Governance
Data Infrastructure
ETL
Data Masking
Data Security
Data Sharing
Data Systems
Data Warehousing
Software Debugging
Amazon DynamoDB
Github
Hive
Identity and Access Management
Python
Meta-Data Management
Team Foundation Server
Query Optimization
Application Data
Simple Data Format
SQL Databases
Data Streaming
Strategies of Testing
Workflow Management Systems
Parquet
Datadog
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Delivery Pipeline
Spark
Change Data Capture
Backend
Cloudformation
Data Lake
Integration Tests
Debezium
Information Technology
Data Lineage
Apache Flink
Avro
Amazon Web Services (AWS)
Data Analytics
Amazon Web Services (AWS)
Kafka
Spark Streaming
Presto
Event Sourcing
Cloudwatch
Amazon Web Services (AWS)
Terraform
Stream Processing
Data Pipelines
Serverless Computing
Amazon Web Services (AWS)
Redshift

Job description

  • Serve as the technical authority for data architecture across the organization, making high-impact decisions on data lake design, streaming topologies, storage formats, partitioning strategies, and data modeling patterns
  • Design, build, and maintain production-grade data pipelines - batch and real-time - from ingestion and transformation to serving and consumption
  • Own the data platform: build and evolve the foundational infrastructure that engineering, ML/AI, and analytics teams depend on for reliable, governed, and performant data access
  • Partner closely with ML/AI engineers to ensure training data, feature pipelines, and model serving data are accurate, fresh, and efficiently delivered - you are the upstream enabler for every model in production
  • Collaborate with backend and full-stack engineers to design event-driven architectures, define data contracts, and ensure application data flows cleanly into the data platform
  • Lead technical design reviews, architecture discussions, and RFC processes for data initiatives - driving alignment across engineering teams
  • Identify and resolve systemic data issues: pipeline failures, data quality degradation, schema drift, latency in streaming systems, cost inefficiencies in storage and computing, and gaps in data observability
  • Define and champion data engineering best practices: data modeling, schema evolution, data contracts, testing strategies, lineage tracking, cataloging, and governance
  • Design and implement data quality frameworks - validation rules, anomaly detection, freshness checks, and alerting - so downstream consumers can trust the data without asking
  • Collaborate closely with Engineering Managers, Product, Data Science, and Analytics to shape data roadmaps and ensure the platform evolves with business needs
  • Mentor and grow engineers at all levels through code reviews, pairing, design feedback, and technical guidance on data engineering topics
  • Contribute to hiring by conducting technical interviews and helping define what great looks like for data engineering at TFS
  • Proactively communicate technical risks, tradeoffs, and recommendations to both engineering and non-technical stakeholders

Requirements

  • Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related field, or equivalent practical experience
  • 7+ years of software or data engineering experience, including 3-5 years focused specifically on data platform and pipeline engineering at scale, with a track record of operating at a principal or staff engineer level
  • Deep expertise in designing and building data lake and Lakehouse architectures on AWS, including:
  • S3 as the foundation for data lake storage, with strong opinions on partitioning, file formats (Parquet, Avro, ORC), and lifecycle management
  • AWS Glue for ETL/ELT jobs, crawlers, and the Data Catalog
  • Amazon Athena for serverless SQL analytics over the data lake
  • Lake Formation for fine-grained access control, governance, and cross-account data sharing
  • Amazon Redshift or Redshift Serverless for data warehousing and high-performance analytical queries
  • Amazon EMR or EMR Serverless for large-scale Spark, Hive, or Presto workloads
  • Production experience with real-time and streaming data architectures, including:
  • Amazon Kinesis (Data Streams, Data Firehose) for real-time ingestion and delivery
  • Amazon MSK (Managed Kafka) or self-managed Kafka for event streaming at scale
  • EventBridge, SQS, or SNS for event-driven integration with application services
  • Lambda for lightweight stream processing and event transformation
  • Apache Flink (via Amazon Managed Service for Apache Flink) or Spark Structured Streaming for stateful stream processing
  • Strong proficiency in Python and SQL - you write production-quality pipeline code, not just ad-hoc scripts, and you can optimize a complex query as fluently as you can design a DAG
  • Experience with workflow orchestration tools: Step Functions, Apache Airflow (via Amazon MWAA), or similar - you know how to build reliable, observable, and recoverable pipeline DAGs
  • Solid understanding of data modeling for both analytical and operational use cases: star schemas, slowly changing dimensions, wide tables, event sourcing, and CDC (change data capture) patterns
  • Experience with data quality and governance tooling and practices: Great Expectations, Deequ, or custom validation frameworks - plus data cataloging, lineage tracking, and access control
  • Strong understanding of Infrastructure as Code using AWS CDK, CloudFormation, or Terraform for data infrastructure
  • Experience with observability and monitoring for data systems: pipeline health dashboards, data freshness tracking, SLA monitoring, and alerting on failures or anomalies (CloudWatch, Datadog, or similar)
  • Strong understanding of security best practices for data: IAM policies, Lake Formation permissions, encryption at rest and in transit, data masking, and PII handling
  • Deep experience debugging complex issues across data systems - pipeline failures, data skew, schema mismatches, streaming lag, and storage cost runaway
  • Experience with testing strategies for data pipelines: data validation, schema contract testing, integration testing, and pipeline idempotency
  • Strong written and verbal communication - you can write a clear RFC, lead a design review, and explain a data architecture tradeoff to a non-technical stakeholder

Added bonus if you have

  • Master's degree in Computer Science, Data Engineering, or related field
  • Experience in the financial services, banking, or insurance industry
  • Experience with open table formats: Apache Iceberg, Delta Lake, or Apache Hudi for ACID transactions, time travel, and schema evolution on the data lake
  • Experience with feature store design and implementation for ML/AI use cases (SageMaker Feature Store, Feast, or custom)
  • Familiarity with dbt or similar transformation frameworks for analytics engineering and data modeling
  • Experience with real-time analytics serving layers: Amazon OpenSearch, DynamoDB, or ElastiCache for low-latency data access
  • Experience designing multi-account AWS data architectures with proper governance and guardrails (AWS Organizations, Control Tower, cross-account data sharing via Lake Formation)
  • Hands-on experience with data mesh or data product patterns - decentralized ownership with centralized governance
  • Experience with CDC (change data capture) tools: AWS DMS, Debezium, or similar for streaming database changes into the data lake
  • Experience with cost optimization for data workloads: storage tiering, compute right-sizing, spot instances for Spark, and query optimization
  • Experience with GenAI data pipelines: preparing training datasets, building RAG knowledge bases, embedding generation, and vector store population
  • AWS certifications (Data Analytics Specialty, Solutions Architect, Database Specialty)
  • Experience with CI/CD pipelines for data infrastructure and pipeline deployment (CodePipeline, GitHub Actions, or similar)
  • Experience contributing to or maintaining open-source data engineering projects
  • Experience defining engineering standards, writing ADRs, or leading org-wide technical initiatives

Benefits & conditions

What we'll bring During your interview process, our team can fill you in on all the details of our industry-leading benefits and career development opportunities. A few highlights include:

  • A work environment built on teamwork, flexibility, and respect
  • Professional growth and development programs to help advance your career, as well as tuition reimbursement
  • Team Member Vehicle Purchase Discount
  • Toyota Team Member Lease Vehicle Program (if applicable)
  • Comprehensive health care and wellness plans for your entire family
  • Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota, regardless of whether you contribute
  • Paid holidays and paid time off
  • Referral services related to prenatal services, adoption, childcare, schools, and more
  • Tax-Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA)
  • Relocation Assistance (if applicable).

Belonging at Toyota

Our success begins and ends with our people. We embrace all perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong.

About the company

Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for talented team members who want to Dream. Do. Grow. with us. An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment., At TFS, we're building next-generation products that redefine mobility for millions of customers worldwide. We're looking for a Sr Lead Data Engineer - an individual contributor at the principal level - who brings deep expertise in data engineering, streaming architectures, and analytics platforms, combined with technical leadership to make data a reliable, scalable foundation for the entire engineering organization. This isn't a management role. It's for the engineer who thinks in pipelines and data contracts: the one who can design a Lakehouse architecture, build a real-time streaming platform, ensure data quality at scale, and make it all self-service for the teams that depend on it. You'll work at the intersection of backend engineering, ML/AI, and analytics - making sure the data that powers our products, models, and decisions is trustworthy, timely, and accessible. If you want to build the data backbone of a modern engineering org - not just move files around - this is the role.

Apply for this position