Principal/Lead Data Engineer Contract W2

ConnectedX, Inc.

Dallas, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Dallas, United States of America

Tech stack

Airflow

Amazon Web Services (AWS)

Data analysis

Architectural Patterns

Automation of Tests

Data Architecture

Data Validation

Information Engineering

Data Governance

Data Systems

DevOps

Distributed Systems

Identity and Access Management

Python

Operational Databases

Performance Tuning

Role-Based Access Control

Cloud Services

Runbook

Scala

Data Processing

Data Ingestion

Spark

Caching

Change Data Capture

Amazon Web Services (AWS)

PySpark

Debezium

Data Lineage

Kafka

Data Management

Data Pipelines

Databricks

Job description

We are seeking an experienced Lead or Principal Data Engineer to join a longterm W2 contract engagement based in Dallas, TX. This is an onsite role for local candidates who can provide handson technical leadership and own the design, implementation, and operational excellence of largescale data platforms. The ideal candidate has deep experience with Databricks and Scala, strong mastery of Spark performance tuning, and a proven track record building metadatadriven, governable data architectures (Medallion architecture preferred) that balance scalability and cost., Architect and lead implementation of a Medallion data architecture that optimizes for scalability, performance, maintainability, and cost-efficiency on Databricks. Design and implement efficient ingestion pipelines, including handling sparse column ingestion patterns and change-data-capture (CDC) scenarios and edge cases. Lead Spark and Databricks performance optimization: analyze job profiles, optimize joins, shuffles, partitioning, caching, and resource configurations to reduce latency and cost. Build metadatadriven frameworks for pipeline orchestration, schema evolution, data quality checks, and automated recovery from failures. Implement and enforce data governance using Unity Catalog and other governance tools: access controls, lineage, classification, and auditability. Design resilient distributed systems with automated failure detection and recovery strategies; investigate and remediate distributed system failures and stability issues. Implement crossaccount AWS integrations securely and reliably (S3, IAM roles, KMS, VPC endpoints, Glue/Glue Catalog interoperability where applicable). Collaborate with data scientists, analytics, DevOps, and security teams to translate business requirements into performant data solutions and SLAs. Mentor engineers, conduct code and architecture reviews, and set best practices for Scala, Spark, and Databricks development. Create runbooks, monitoring dashboards, and operational playbooks to support 24x7 production reliability and incident response.

Requirements

15+ years of handson data engineering experience; 5+ years in a lead or principal role designing and operating production data platforms. Extensive experience with Databricks and Apache Spark, including production job tuning, cluster sizing, and cost optimization. Strong proficiency in Scala for data processing; experience with Python/PySpark is a plus. Deep understanding of Medallion architecture patterns (bronze/silver/gold layers) and how to implement them in cloud data platforms. Proven experience handling sparse column ingestion issues, schema drift, and CDC edge cases (Debezium/Kafka or vendor CDC solutions experience is a plus). Experience building metadatadriven frameworks for schema management, pipeline orchestration (Airflow, Databricks Jobs, or similar), and automated testing. Solid knowledge of data governance and security: Unity Catalog, IAM, RBAC, encryption at rest/in transit, and data lineage. Strong AWS experience: S3 lifecycle policies, crossaccount access, IAM role assumptions, KMS, VPC endpoints, and Glue/Glue Catalog integration. Demonstrated ability to design for distributed system resiliency and troubleshoot complex failures across clusters and networks. Excellent communication skills; experience working directly with stakeholders and leading technical discussions.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all