Data Bricks Data Engineer

Centraprise Corp
Everett, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Everett, United States of America

Tech stack

Agile Methodologies
Data analysis
Azure
Bash
Big Data
Program Optimization
Computer Programming
Continuous Integration
Data Governance
Data Integration
ETL
Data Security
Data Systems
IBM InfoSphere DataStage
Job Scheduling
Python
Log Analysis
Metadata
Performance Tuning
Scrum
Release Management
Azure
SQL Databases
Web Applications
Scripting (Bash/Python/Go/Ruby)
Azure
Autoscaling
Software Troubleshooting
Gitlab
PySpark
Google Cloud Functions
Data Management
Terraform
Software Version Control
Data Pipelines
Azure
Key Vault
Databricks

Job description

  • Design, develop, and maintain end-to-end data pipelines and ETL/ELT workflows using PySpark and Python.

  • Ensure and lead the efforts to review Legacy Data Stage legacy code and migrated Data bricks code to ensure functionality is not deviated

  • Implement, optimize, and monitor large-scale data processing workloads in Azure Databricks, including cluster configuration, autoscaling, and governance.

  • Build and maintain data integration and orchestration solutions using Azure services to meet performance, availability, and security requirements.

  • Collaborate with data consumers, thread authors/owners, and stakeholders to gather business requirements, prioritize needs, and translate analytical objectives into technical designs.

  • Implement secure data access patterns using Azure Active Directory, Managed Identities, and service principals.

  • Author Infrastructure-as-Code for Azure resources (ARM templates) and deploy consistent, repeatable environments.

  • Configure and operate Azure components including Storage Account, Synapse, Key Vault, VMSS, Function Apps, Web Apps, Log Analytics Workspace, Azure Container Apps / container instances, and related services.

  • Collaborate with networking and security teams to design and implement Azure networking for data solutions.

  • Implement monitoring, alerting, and cost optimization for data workloads (Log Analytics, metrics, and dashboards).

  • Use GitLab and Azure DevOps for source control, CI/CD pipelines, and release management.

  • Follow Agile/Scrum practices and participate in sprint planning, standups, and retrospectives.

  • Ensure solutions meet data governance, lineage, and compliance requirements.

  • Operations Support and Oncall Support for Production Issues and Deployments.

Requirements

  • Awareness of IBM Data Stage ETL/ELT data integration tool to understand existing code.

  • Develop , Test , Deploy ,Optimize, and monitor large-scale data processing workloads in Azure Data Bricks ETL.

  • Ensure and lead the efforts to review Legacy Data Stage legacy code and migrated Data bricks code to ensure functionality is not deviated

  • Strong programming skills in Python and PySpark.

  • Advanced proficiency writing SQL for analytics and ETL processes.

  • Proven experience building and optimizing complex data pipelines in Azure.

  • Hands-on experience with Azure Databricks: cluster management, job scheduling, workspace governance.

  • Strong working knowledge of core Azure services: Storage Account, Synapse, Key Vault, VMSS, Function Apps, Web Apps, Log Analytics Workspace, service principals, and managed identities.

  • Experience with container services (ACA, container instances) and containerized data workloads.

  • Familiarity with Azure networking concepts and secure network integration for data platforms.

  • Experience creating Azure infrastructure using ARM templates.

  • Proficient with GitLab and Azure DevOps for CI/CD and source control workflows.

  • Strong analytical, problem-solving, and communication skills; proven ability to work cross-functionally.

  • Experience working in Agile teams and understanding of data governance frameworks.

  • Hands-on experience provisioning Databricks resources with Terraform; ability to author and maintain Terraform templates and modules.

  • Demonstrated experience implementing cluster autoscaling and autoscaling policies through Terraform.

  • Experience creating reusable Terraform modules and implementing infrastructure-as-code best practices (module structure, state management, remote backends).

  • Proven experience working on Databricks platform operations, including cluster configuration, job orchestration, and platform optimization.

  • Experience configuring high-availability Databricks deployments and operating across multiple availability zones/regions.

  • Familiarity with Metastore/Unity Catalog configuration and metadata governance in Databricks.

  • Hands-on experience building data pipelines and ingestion workflows into medallion-layer architectures (bronze/silver/gold).

  • Strong scripting skills (Python, Bash, or similar) and familiarity with CI/CD for Terraform and Databricks deployments.

  • Strong troubleshooting, performance tuning, and cost optimization skills.

Apply for this position