Databricks Data Engineer

ATEM Corp

Deerfield, United States of America

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Deerfield, United States of America

Tech stack

Apache HTTP Server

Azure

Continuous Integration

Data Validation

Data Cleansing

Information Engineering

Github

JUnit

Python

Log Analysis

Cloud Services

Standard Sql

Azure

SonarQube

Azure

Data Logging

Postman

Azure

Spark

GIT

PySpark

Integration Tests

Kafka

Cosmos DB

Azure

Spark Streaming

Data Pipelines

Docker

Key Vault

Databricks

Job description

Responsible for building data products in Databricks using Scala/Spark
Responsible for Ops work managing production for the data products developed and deployed to production
Responsible for testing data products based on product specification and end to end validation
Set up monitoring, logging, and alerting Spark jobs and data pipelines using Azure Monitor/Log Analytics or similar tools
Coordinate with offshore team in India

Requirements

Must have : Azure data bricks, Scala is MUST

Mandatory skills: Databricks, Spark, Azure, Python, Cosmos DB, Azure DevOps (GitHub, CI/CD pipelines, Boards etc.), Docker & Azure Kubernetes Service, Graffana, JUnit, Postman, SonarQube
Additional Skills: 3 6 years of experience in data engineering on cloud data platforms. Hands on experience building Spark jobs with Scala/Spark and/or PySpark on Databricks. Experience ingesting data from batch and streaming sources into ADLS Gen2 using Delta or Apache Iceberg tables. Good SQL skills for joins, aggregations, and data quality checks. Understanding of core Azure data services (Event Hubs/Kafka, Data Factory/Databricks Workflows, Key Vault). Experience working with Git based workflows and CI/CD in Azure DevOps or GitHub.
Good to have skills: Exposure to Spark Structured Streaming for near real time use cases. Experience with data quality tools or frameworks and writing unit/integration tests for data pipelines. Familiarity with data modeling and performance considerations in Lakehouse environments.