Data Engineer with Cloud Data Integration & Transformation

NextGen Staffing

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Tech stack

Analysis of Variance (ANOVA)

Azure

COBOL

Profiling

Databases

Continuous Integration

Data Cleansing

Data Deduplication

Information Engineering

Data Integrity

ETL

Data Transformation

IBM DB2

DevOps

Java Database Connectivity

Key Management

Log Analysis

Metadata

Microsoft SQL Server

Parsing

Power BI

Azure DevOps Pipelines

Azure

SQL Databases

Virtual Storage Access Methods

Data Logging

Data Processing

Azure

Spark

GIT

Data Lake

PySpark

Data Pipelines

Databricks

Job description

We are seeking a hands-on Data Engineer to develop and maintain scalable data pipelines and transformation routines within a modern Azure + Databricks environment. This role is focused on executing ingestion, cleansing, standardization, matching, merging, and enrichment of complex legacy datasets into a governed data lakehouse architecture., Pipeline Development & Maintenance

Build and maintain reusable data pipelines using Databricks, PySpark, and SQL.

Implement full and incremental loads from sources including VSAM, Db2 (LUW and z/OS), SQL Server, and flat files.

Use Delta Lake on ADLS Gen2 to support ACID transactions, scalable upserts/merges, and time travel.

Leverage Azure Data Factory for orchestration and triggering of Delta Live Tables and Databricks Jobs as part of nightly pipeline execution.

Data Cleansing & Transformation

Apply cleansing logic for deduplication, parsing, standardization, and enrichment based on business rule definitions.

Use Spark-Cobol Library to parse EBCDIC/COBOL-formatted VSAM files into structured DataFrames.

Maintain ''bronze * silver * gold'' structured layers and ensure quality during data transformations.

Support classification and mapping logic in collaboration with analysts and architects.

Observability, Testing & Validation

Integrate robust logging and exception handling to enable observability and pipeline traceability.

Monitor job performance and cost with Azure Monitor and Log Analytics.

Support validation and testing using frameworks like Great Expectations or dbt tests to enforce expectations on nulls, ranges, and referential integrity.

Security, DevOps & Deployment

Store and manage credentials securely using Azure Key Vault during pipeline execution.

Maintain pipeline code using Azure DevOps Repos and participate in peer reviews and promotion workflows via Azure DevOps Pipelines.

Deploy notebooks, configurations, and transformations using CI/CD best practices in repeatable environments.

Collaboration & Profiling

Collaborate with architects to ensure alignment with data platform standards and governance models.

Work with analysts and SMEs to profile data, refine cleansing logic, and conduct variance analysis using Databricks Notebooks and Databricks SQL Warehouse.

Support metric publication and lineage registration using Microsoft Purview and Unity Catalog and contribute to profiling datasets for Power BI consumption.

Requirements

Note: Need 14 + years of experience. Candidate should have good communication skills. Note Candidate must have experience of migration, integration from Legacy system (COBOL, DB2) to Azure Databricks., The ideal candidate brings deep experience with Spark (PySpark), Delta Lake, Azure Data Factory, and data wrangling techniques - and is comfortable working in a structured, code-managed, team-based delivery environment., 5+ years of experience in data engineering or ETL development roles.

Proficiency in: