Data Quality Engineer

Select Minds LLC
Dallas, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote
Dallas, United States of America

Tech stack

Amazon Web Services (AWS)
Amazon Web Services (AWS)
Continuous Integration
Data as a Services
Data Validation
Information Engineering
Data Integrity
ETL
Serialization
Data Systems
Database Queries
Database Testing
Software Debugging
Distributed Data Store
Github
Protocol Buffers
JSON
Python
Prometheus
SQL Databases
Data Streaming
Grafana
Concurrency
Spark
PySpark
Data Lineage
Low Latency
Avro
Kafka
Data Management
Cloudwatch
Data Pipelines
SDET
Jenkins
Databricks

Job description

We are looking for a Data Quality Engineer to own validation across batch and streaming data pipelines. This role focuses on ensuring data correctness, reliability, and performance across platforms built on Databricks, Kafka, AWS, SQL, and Python. This is a hands-on role focused on building scalable data validation frameworks and ensuring production-grade data systems., End-to-End Data Validation

  • Validate data pipelines for accuracy, completeness, consistency, and timeliness

  • Build SQL-based validations for business rules and transformations

  • Implement reconciliation between source and downstream systems

  • Ensure data lineage and traceability ETL / ELT & Spark Testing

  • Test pipelines built on AWS (Glue, Lambda, EMR, Step Functions)

  • Validate transformations using SQL and Python

  • Test ingestion, transformation, aggregation, and serving layers

  • Handle backfills, reprocessing, and historical data loads

  • Validate Spark pipelines (PySpark/Scala) on Databricks Streaming (Kafka)

  • Validate data integrity, ordering, and delivery guarantees

  • Test producer and consumer logic and serialization formats (Avro, JSON, Protobuf)

  • Validate topics, partitions, offsets, retention, and schema evolution

  • Simulate late events, duplicates, and failure scenarios Automation & Frameworks

  • Build Python-based data testing frameworks

  • Develop reusable validation utilities and synthetic datasets

  • Integrate data tests into CI/CD pipelines

  • Enable automated alerts for data quality issues Performance & Reliability

  • Validate throughput, latency, and concurrency at scale

  • Test retry logic, idempotency, and recovery mechanisms

  • Perform regression, soak, and failover testing Observability

  • Validate logs, metrics, and alerts using tools such as CloudWatch, Prometheus, and Grafana

  • Define and monitor data SLAs and SLOs

  • Support incident response, root cause analysis, and postmortems

Requirements

  • 7+ years of total experience in QA, SDET, or Data Quality Engineering

  • Minimum 4-6 years of hands-on experience working with data platforms, data pipelines, or data engineering ecosystems

  • 3+ years of hands-on experience with Databricks and Apache Spark

  • Strong SQL skills for data validation, reconciliation, and complex analysis

  • Proficiency in Python for automation and data validation

  • Experience testing ETL/ELT pipelines (batch and streaming)

  • Hands-on experience with Kafka or similar streaming platforms

  • Strong understanding of AWS data services (S3, Glue, Lambda, Redshift, Athena)

  • Experience working with large-scale distributed data systems

  • Strong debugging, analytical, and problem-solving skills Nice to Have

  • Experience with data quality or observability tools such as Great Expectations or Monte Carlo

  • Knowledge of schema registry and data contracts

  • Experience with CI/CD tools such as GitHub Actions or Jenkins Flexible work from home options available.

Apply for this position