Databricks Data Engineer
ATEM Corp
Deerfield, United States of America
6 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Deerfield, United States of America
Tech stack
Apache HTTP Server
Azure
Continuous Integration
Data Validation
Data Cleansing
Information Engineering
Github
JUnit
Python
Log Analysis
Cloud Services
Standard Sql
Azure
SonarQube
Azure
Data Logging
Postman
Azure
Spark
GIT
PySpark
Integration Tests
Kafka
Cosmos DB
Azure
Spark Streaming
Data Pipelines
Docker
Key Vault
Databricks
Job description
- Responsible for building data products in Databricks using Scala/Spark
- Responsible for Ops work managing production for the data products developed and deployed to production
- Responsible for testing data products based on product specification and end to end validation
- Set up monitoring, logging, and alerting Spark jobs and data pipelines using Azure Monitor/Log Analytics or similar tools
- Coordinate with offshore team in India
Requirements
Must have : Azure data bricks, Scala is MUST
- Mandatory skills: Databricks, Spark, Azure, Python, Cosmos DB, Azure DevOps (GitHub, CI/CD pipelines, Boards etc.), Docker & Azure Kubernetes Service, Graffana, JUnit, Postman, SonarQube
- Additional Skills: 3 6 years of experience in data engineering on cloud data platforms. Hands on experience building Spark jobs with Scala/Spark and/or PySpark on Databricks. Experience ingesting data from batch and streaming sources into ADLS Gen2 using Delta or Apache Iceberg tables. Good SQL skills for joins, aggregations, and data quality checks. Understanding of core Azure data services (Event Hubs/Kafka, Data Factory/Databricks Workflows, Key Vault). Experience working with Git based workflows and CI/CD in Azure DevOps or GitHub.
- Good to have skills: Exposure to Spark Structured Streaming for near real time use cases. Experience with data quality tools or frameworks and writing unit/integration tests for data pipelines. Familiarity with data modeling and performance considerations in Lakehouse environments.