Principal/Lead Data Engineer Contract W2

ConnectedX, Inc.
Dallas, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Dallas, United States of America

Tech stack

Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Architectural Patterns
Automation of Tests
Data Architecture
Data Validation
Information Engineering
Data Governance
Data Systems
DevOps
Distributed Systems
Identity and Access Management
Python
Operational Databases
Performance Tuning
Role-Based Access Control
Cloud Services
Runbook
Scala
Data Processing
Data Ingestion
Spark
Caching
Change Data Capture
Amazon Web Services (AWS)
PySpark
Debezium
Data Lineage
Kafka
Data Management
Data Pipelines
Databricks

Job description

We are seeking an experienced Lead or Principal Data Engineer to join a longterm W2 contract engagement based in Dallas, TX. This is an onsite role for local candidates who can provide handson technical leadership and own the design, implementation, and operational excellence of largescale data platforms. The ideal candidate has deep experience with Databricks and Scala, strong mastery of Spark performance tuning, and a proven track record building metadatadriven, governable data architectures (Medallion architecture preferred) that balance scalability and cost., Architect and lead implementation of a Medallion data architecture that optimizes for scalability, performance, maintainability, and cost-efficiency on Databricks. Design and implement efficient ingestion pipelines, including handling sparse column ingestion patterns and change-data-capture (CDC) scenarios and edge cases. Lead Spark and Databricks performance optimization: analyze job profiles, optimize joins, shuffles, partitioning, caching, and resource configurations to reduce latency and cost. Build metadatadriven frameworks for pipeline orchestration, schema evolution, data quality checks, and automated recovery from failures. Implement and enforce data governance using Unity Catalog and other governance tools: access controls, lineage, classification, and auditability. Design resilient distributed systems with automated failure detection and recovery strategies; investigate and remediate distributed system failures and stability issues. Implement crossaccount AWS integrations securely and reliably (S3, IAM roles, KMS, VPC endpoints, Glue/Glue Catalog interoperability where applicable). Collaborate with data scientists, analytics, DevOps, and security teams to translate business requirements into performant data solutions and SLAs. Mentor engineers, conduct code and architecture reviews, and set best practices for Scala, Spark, and Databricks development. Create runbooks, monitoring dashboards, and operational playbooks to support 24x7 production reliability and incident response.

Requirements

15+ years of handson data engineering experience; 5+ years in a lead or principal role designing and operating production data platforms. Extensive experience with Databricks and Apache Spark, including production job tuning, cluster sizing, and cost optimization. Strong proficiency in Scala for data processing; experience with Python/PySpark is a plus. Deep understanding of Medallion architecture patterns (bronze/silver/gold layers) and how to implement them in cloud data platforms. Proven experience handling sparse column ingestion issues, schema drift, and CDC edge cases (Debezium/Kafka or vendor CDC solutions experience is a plus). Experience building metadatadriven frameworks for schema management, pipeline orchestration (Airflow, Databricks Jobs, or similar), and automated testing. Solid knowledge of data governance and security: Unity Catalog, IAM, RBAC, encryption at rest/in transit, and data lineage. Strong AWS experience: S3 lifecycle policies, crossaccount access, IAM role assumptions, KMS, VPC endpoints, and Glue/Glue Catalog integration. Demonstrated ability to design for distributed system resiliency and troubleshoot complex failures across clusters and networks. Excellent communication skills; experience working directly with stakeholders and leading technical discussions.

Apply for this position