Data migration (Spark)

Inherent Technologies
San Francisco, United States of America
1 month ago

Role details

Contract type
Temporary to permanent
Employment type
Part-time / full-time
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
San Francisco, United States of America

Tech stack

Amazon Web Services (AWS)
Amazon Web Services (AWS)
Azure
Cloud Computing
Cloud Storage
Data Migration
Hadoop
Hadoop Distributed File System
MapReduce
HBase
Hive
Java Virtual Machine (JVM)
Python
Apache Oozie
Performance Tuning
Regression Testing
SQL Databases
Sqoop
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
Performance Testing
Data Ingestion
Spark
PySpark
Kubernetes
Kafka
Apache Nifi
Data Management
Code Restructuring
Azure
Data Pipelines
Amazon Web Services (AWS)
Legacy Systems
Databricks

Job description

A Spark job migration specialist migrates data pipelines, JAR tasks, and analytics workloads from legacy systems (like Hadoop/CDH or AWS EMR) to ACOS modern platforms This involves refactoring code (e.g., Hive to PySpark), performance testing, and updating Spark 2.x to 3.x., Workload Migration: Migrate JVM workloads and Spark-Submit tasks to Databricks JAR tasks or Notebook tasks. Pipeline Re-engineering: Convert existing HiveQL scripts and Oozie workflows into optimized Spark SQL or PySpark applications. Refactoring: Adapt data pipelines from Azure Synapse to any cloud platform , including updating library dependencies and notebook references. Performance Optimization: Implement Adaptive Query Execution (AQE) in Spark 3 to improve shuffle performance and fix skew joins. Testing & Validation: Perform regression testing to ensure output consistency between old and new systems using validation scripts. Job Customization: Use spark.sparkContext.setJobDescription() to label, monitor, and troubleshoot specific Spark tasks in the UI.

Requirements

Mandatory Skills: Sparks jobs migration & Kubernetes skills, Experience: 5+ years' experience with Apache Spark (PySpark/Scala) and Cloud platforms (Azure/AWS). Requirements: Strong experience with HDFS, Hadoop ecosystem (Hive, Spark, HBase, MapReduce). Experience in data migration to cloud/enterprise data platforms. Knowledge of: Data ingestion tools (Sqoop, Kafka, NiFi, etc.) Cloud storage (ADLS, S3, Blob Storage) Distributed processing frameworks SQL and performance tuning expertise. Experience in scripting (Python, Shell, Scala). Key Migration Focus Areas Data Pipelines: Ensuring schema evolution, data correctness, and testing with golden datasets. Job Definitions: Reconfiguring job properties, cluster settings, and Spark configurations.

About the company

© 2026 Careerjet All rights reserved

Apply for this position