Data Bricks Data Engineer

Centraprise Corp

Everett, United States of America

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Everett, United States of America

Tech stack

Agile Methodologies

Data analysis

Azure

Bash

Big Data

Program Optimization

Computer Programming

Continuous Integration

Data Governance

Data Integration

ETL

Data Security

Data Systems

IBM InfoSphere DataStage

Job Scheduling

Python

Log Analysis

Metadata

Performance Tuning

Scrum

Release Management

Azure

SQL Databases

Web Applications

Scripting (Bash/Python/Go/Ruby)

Azure

Autoscaling

Software Troubleshooting

Gitlab

PySpark

Google Cloud Functions

Data Management

Terraform

Software Version Control

Data Pipelines

Azure

Key Vault

Databricks

Job description

Design, develop, and maintain end-to-end data pipelines and ETL/ELT workflows using PySpark and Python.
Ensure and lead the efforts to review Legacy Data Stage legacy code and migrated Data bricks code to ensure functionality is not deviated
Implement, optimize, and monitor large-scale data processing workloads in Azure Databricks, including cluster configuration, autoscaling, and governance.
Build and maintain data integration and orchestration solutions using Azure services to meet performance, availability, and security requirements.
Collaborate with data consumers, thread authors/owners, and stakeholders to gather business requirements, prioritize needs, and translate analytical objectives into technical designs.
Implement secure data access patterns using Azure Active Directory, Managed Identities, and service principals.
Author Infrastructure-as-Code for Azure resources (ARM templates) and deploy consistent, repeatable environments.
Configure and operate Azure components including Storage Account, Synapse, Key Vault, VMSS, Function Apps, Web Apps, Log Analytics Workspace, Azure Container Apps / container instances, and related services.
Collaborate with networking and security teams to design and implement Azure networking for data solutions.
Implement monitoring, alerting, and cost optimization for data workloads (Log Analytics, metrics, and dashboards).
Use GitLab and Azure DevOps for source control, CI/CD pipelines, and release management.
Follow Agile/Scrum practices and participate in sprint planning, standups, and retrospectives.
Ensure solutions meet data governance, lineage, and compliance requirements.
Operations Support and Oncall Support for Production Issues and Deployments.

Requirements

Awareness of IBM Data Stage ETL/ELT data integration tool to understand existing code.
Develop , Test , Deploy ,Optimize, and monitor large-scale data processing workloads in Azure Data Bricks ETL.
Ensure and lead the efforts to review Legacy Data Stage legacy code and migrated Data bricks code to ensure functionality is not deviated
Strong programming skills in Python and PySpark.
Advanced proficiency writing SQL for analytics and ETL processes.
Proven experience building and optimizing complex data pipelines in Azure.
Hands-on experience with Azure Databricks: cluster management, job scheduling, workspace governance.
Strong working knowledge of core Azure services: Storage Account, Synapse, Key Vault, VMSS, Function Apps, Web Apps, Log Analytics Workspace, service principals, and managed identities.
Experience with container services (ACA, container instances) and containerized data workloads.
Familiarity with Azure networking concepts and secure network integration for data platforms.
Experience creating Azure infrastructure using ARM templates.
Proficient with GitLab and Azure DevOps for CI/CD and source control workflows.
Strong analytical, problem-solving, and communication skills; proven ability to work cross-functionally.
Experience working in Agile teams and understanding of data governance frameworks.
Hands-on experience provisioning Databricks resources with Terraform; ability to author and maintain Terraform templates and modules.
Demonstrated experience implementing cluster autoscaling and autoscaling policies through Terraform.
Experience creating reusable Terraform modules and implementing infrastructure-as-code best practices (module structure, state management, remote backends).
Proven experience working on Databricks platform operations, including cluster configuration, job orchestration, and platform optimization.
Experience configuring high-availability Databricks deployments and operating across multiple availability zones/regions.
Familiarity with Metastore/Unity Catalog configuration and metadata governance in Databricks.
Hands-on experience building data pipelines and ingestion workflows into medallion-layer architectures (bronze/silver/gold).
Strong scripting skills (Python, Bash, or similar) and familiarity with CI/CD for Terraform and Databricks deployments.
Strong troubleshooting, performance tuning, and cost optimization skills.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all