Databricks Data Engineer

Cays Inc
2 days ago

Role details

Contract type
Temporary to permanent
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Big Data
Data Validation
Data Mining
Performance Tuning
SQL Databases
Parquet
Data Logging
File Transfer Protocol (FTP)
Data Lake
PySpark
Data Inconsistencies
Data Pipelines
Databricks
Data Generation

Job description

We are looking to immediately onboard a Senior Databricks Data Engineer to support a high-priority initiative delivering data to IPSOS for MMM/MTA modeling. We are building data pipelines to extract and deliver curated datasets (approx. 150GB historical + weekly increments) from Databricks bronze/silver layers to an external analytics partner (IPSOS). The data will be used for MMM/MTA modeling, so accuracy, consistency, and reliability are critical., Data Extraction & Engineering Build scalable extraction pipelines from Databricks (bronze/silver layers) Prepare datasets for external consumption (column selection, renaming, formatting, normalization) Work across ~10 20 fact and dimension tables spanning media and sales domains Incremental Pipeline Development Design and implement incremental logic using timestamps or CDC patterns Optimize for ongoing weekly loads (~2GB) while supporting large historical extracts File Generation & Optimization Generate export-ready datasets in CSV/Parquet formats Implement partitioning strategies for performance (e.g., by date/source) Apply compression and optimize file sizes for transfer Data Validation & Quality Implement validation checks (schema, row counts, completeness) Troubleshoot data inconsistencies across multiple sources Secure Delivery Support secure file delivery (e.g., SFTP, encryption) Implement monitoring, logging, retry logic, and failure notifications Collaboration Work closely with internal teams and IPSOS for data validation and issue resolution Support onboarding and early-stage troubleshooting

Requirements

Strong hands-on experience with Databricks (Delta Lake, notebooks, jobs) Proficiency in PySpark and SQL for large-scale data processing Experience with incremental pipelines (CDC, watermarking) Solid understanding of data modeling (fact/dimension, grain alignment) Experience handling large datasets (100GB+) and performance tuning Familiarity with file-based delivery (CSV/Parquet) and secure transfer (SFTP, encryption) Nice to Have Experience with MMM/MTA or marketing datasets (Google, Meta, Amazon, etc.) Experience working with external analytics partners (e.g., IPSOS, Nielsen)

CONTRACTOR, FULL_TIME

Apply for this position