Lead Data Engineer

Toyota Motor North America

Plano, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Plano, United States of America

Tech stack

Training Data

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Data analysis

Apache HTTP Server

Application Services

Code Review

Information Systems

Databases

Directed Acyclic Graph (Directed Graphs)

Data Architecture

Data Validation

Information Engineering

Data Governance

Data Infrastructure

ETL

Data Masking

Data Security

Data Sharing

Data Systems

Data Warehousing

Software Debugging

Amazon DynamoDB

Github

Hive

Identity and Access Management

Python

Meta-Data Management

Team Foundation Server

Query Optimization

Application Data

Simple Data Format

SQL Databases

Data Streaming

Strategies of Testing

Workflow Management Systems

Parquet

Datadog

Amazon Web Services (AWS)

Delivery Pipeline

Spark

Change Data Capture

Backend

Cloudformation

Data Lake

Integration Tests

Debezium

Information Technology

Data Lineage

Apache Flink

Avro

Amazon Web Services (AWS)

Data Analytics

Amazon Web Services (AWS)

Kafka

Spark Streaming

Presto

Event Sourcing

Cloudwatch

Amazon Web Services (AWS)

Terraform

Stream Processing

Data Pipelines

Serverless Computing

Amazon Web Services (AWS)

Redshift

Job description

Serve as the technical authority for data architecture across the organization, making high-impact decisions on data lake design, streaming topologies, storage formats, partitioning strategies, and data modeling patterns
Design, build, and maintain production-grade data pipelines - batch and real-time - from ingestion and transformation to serving and consumption
Own the data platform: build and evolve the foundational infrastructure that engineering, ML/AI, and analytics teams depend on for reliable, governed, and performant data access
Partner closely with ML/AI engineers to ensure training data, feature pipelines, and model serving data are accurate, fresh, and efficiently delivered - you are the upstream enabler for every model in production
Collaborate with backend and full-stack engineers to design event-driven architectures, define data contracts, and ensure application data flows cleanly into the data platform
Lead technical design reviews, architecture discussions, and RFC processes for data initiatives - driving alignment across engineering teams
Identify and resolve systemic data issues: pipeline failures, data quality degradation, schema drift, latency in streaming systems, cost inefficiencies in storage and computing, and gaps in data observability
Define and champion data engineering best practices: data modeling, schema evolution, data contracts, testing strategies, lineage tracking, cataloging, and governance
Design and implement data quality frameworks - validation rules, anomaly detection, freshness checks, and alerting - so downstream consumers can trust the data without asking
Collaborate closely with Engineering Managers, Product, Data Science, and Analytics to shape data roadmaps and ensure the platform evolves with business needs
Mentor and grow engineers at all levels through code reviews, pairing, design feedback, and technical guidance on data engineering topics
Contribute to hiring by conducting technical interviews and helping define what great looks like for data engineering at TFS
Proactively communicate technical risks, tradeoffs, and recommendations to both engineering and non-technical stakeholders

Requirements

Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related field, or equivalent practical experience
7+ years of software or data engineering experience, including 3-5 years focused specifically on data platform and pipeline engineering at scale, with a track record of operating at a principal or staff engineer level
Deep expertise in designing and building data lake and Lakehouse architectures on AWS, including:

S3 as the foundation for data lake storage, with strong opinions on partitioning, file formats (Parquet, Avro, ORC), and lifecycle management
AWS Glue for ETL/ELT jobs, crawlers, and the Data Catalog
Amazon Athena for serverless SQL analytics over the data lake
Lake Formation for fine-grained access control, governance, and cross-account data sharing
Amazon Redshift or Redshift Serverless for data warehousing and high-performance analytical queries
Amazon EMR or EMR Serverless for large-scale Spark, Hive, or Presto workloads

Production experience with real-time and streaming data architectures, including:

Amazon Kinesis (Data Streams, Data Firehose) for real-time ingestion and delivery
Amazon MSK (Managed Kafka) or self-managed Kafka for event streaming at scale
EventBridge, SQS, or SNS for event-driven integration with application services
Lambda for lightweight stream processing and event transformation
Apache Flink (via Amazon Managed Service for Apache Flink) or Spark Structured Streaming for stateful stream processing

Strong proficiency in Python and SQL - you write production-quality pipeline code, not just ad-hoc scripts, and you can optimize a complex query as fluently as you can design a DAG
Experience with workflow orchestration tools: Step Functions, Apache Airflow (via Amazon MWAA), or similar - you know how to build reliable, observable, and recoverable pipeline DAGs
Solid understanding of data modeling for both analytical and operational use cases: star schemas, slowly changing dimensions, wide tables, event sourcing, and CDC (change data capture) patterns
Experience with data quality and governance tooling and practices: Great Expectations, Deequ, or custom validation frameworks - plus data cataloging, lineage tracking, and access control
Strong understanding of Infrastructure as Code using AWS CDK, CloudFormation, or Terraform for data infrastructure
Experience with observability and monitoring for data systems: pipeline health dashboards, data freshness tracking, SLA monitoring, and alerting on failures or anomalies (CloudWatch, Datadog, or similar)
Strong understanding of security best practices for data: IAM policies, Lake Formation permissions, encryption at rest and in transit, data masking, and PII handling
Deep experience debugging complex issues across data systems - pipeline failures, data skew, schema mismatches, streaming lag, and storage cost runaway
Experience with testing strategies for data pipelines: data validation, schema contract testing, integration testing, and pipeline idempotency
Strong written and verbal communication - you can write a clear RFC, lead a design review, and explain a data architecture tradeoff to a non-technical stakeholder

Added bonus if you have

Master's degree in Computer Science, Data Engineering, or related field
Experience in the financial services, banking, or insurance industry
Experience with open table formats: Apache Iceberg, Delta Lake, or Apache Hudi for ACID transactions, time travel, and schema evolution on the data lake
Experience with feature store design and implementation for ML/AI use cases (SageMaker Feature Store, Feast, or custom)
Familiarity with dbt or similar transformation frameworks for analytics engineering and data modeling
Experience with real-time analytics serving layers: Amazon OpenSearch, DynamoDB, or ElastiCache for low-latency data access
Experience designing multi-account AWS data architectures with proper governance and guardrails (AWS Organizations, Control Tower, cross-account data sharing via Lake Formation)
Hands-on experience with data mesh or data product patterns - decentralized ownership with centralized governance
Experience with CDC (change data capture) tools: AWS DMS, Debezium, or similar for streaming database changes into the data lake
Experience with cost optimization for data workloads: storage tiering, compute right-sizing, spot instances for Spark, and query optimization
Experience with GenAI data pipelines: preparing training datasets, building RAG knowledge bases, embedding generation, and vector store population
AWS certifications (Data Analytics Specialty, Solutions Architect, Database Specialty)
Experience with CI/CD pipelines for data infrastructure and pipeline deployment (CodePipeline, GitHub Actions, or similar)
Experience contributing to or maintaining open-source data engineering projects
Experience defining engineering standards, writing ADRs, or leading org-wide technical initiatives

Benefits & conditions

What we'll bring During your interview process, our team can fill you in on all the details of our industry-leading benefits and career development opportunities. A few highlights include:

A work environment built on teamwork, flexibility, and respect
Professional growth and development programs to help advance your career, as well as tuition reimbursement
Team Member Vehicle Purchase Discount
Toyota Team Member Lease Vehicle Program (if applicable)
Comprehensive health care and wellness plans for your entire family
Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota, regardless of whether you contribute
Paid holidays and paid time off
Referral services related to prenatal services, adoption, childcare, schools, and more
Tax-Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA)
Relocation Assistance (if applicable).

Belonging at Toyota

Our success begins and ends with our people. We embrace all perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong.

About the company

Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for talented team members who want to Dream. Do. Grow. with us. An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment., At TFS, we're building next-generation products that redefine mobility for millions of customers worldwide. We're looking for a Sr Lead Data Engineer - an individual contributor at the principal level - who brings deep expertise in data engineering, streaming architectures, and analytics platforms, combined with technical leadership to make data a reliable, scalable foundation for the entire engineering organization. This isn't a management role. It's for the engineer who thinks in pipelines and data contracts: the one who can design a Lakehouse architecture, build a real-time streaming platform, ensure data quality at scale, and make it all self-service for the teams that depend on it. You'll work at the intersection of backend engineering, ML/AI, and analytics - making sure the data that powers our products, models, and decisions is trustworthy, timely, and accessible. If you want to build the data backbone of a modern engineering org - not just move files around - this is the role.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all