Data/Scala/Spark Engineering Specialist

Anagha Techno Soft
New York, United States of America
2 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

New York, United States of America

Tech stack

API
Airflow
Unit Testing
CA Workload Automation Ae
Azure
Continuous Integration
Information Engineering
ETL
Github
Python
Networking Basics
Performance Tuning
Cloud Services
SQL Databases
Teradata
Snowflake
Spark
Star Schema
Serverless Computing
Databricks
Artifactory

Job description

We're migrating complex on-prem regulatory reporting pipelines from a legacy ETL + Autosys + SQL + Teradata stack to a modern Databricks + Snowflake platform on Azure. The role is hands-on: design, implement, test, and reconcile production pipelines feeding regulatory reports under strict parity requirements.

Requirements

Scala / Spark production experience writing Spark applications in Scala (not just notebooks); comfortable with the Data Frame API, joins, window functions, partitioning, and performance tuning Databricks Serverless compute, Unity Catalog, Asset Bundles, Databricks CLI SQL fluency comfortable writing, analyzing and extracting requirements from complex SQL scripts Snowflake schema design, performance, Spark-Snowflake connector Azure ADLS, networking basics, secrets/identity (Entra ID / managed identities) Orchestration Airflow (DAG authoring, sensors, retries, SLAs) CI/CD Artifactory, GitHub Actions pipelines: build, sharded test matrices, artifact promotion through dev QA UAT prod Testing Experience in TDD, writing unit tests (ScalaTest, AnyFlatSpec) and BDD (Concordion or equivalent) Data quality & reconciliation building automated parity checks against legacy outputs, drift detection, row-level reconciliation tooling Large-scale migrations proven track record migrating legacy ETL (Autosys/Informatica/etc.) to cloud data platforms, including dependency mapping and cutover planning Modern data engineering practices medallion architecture (Bronze/Silver/Gold), idempotent pipelines, schema evolution, lineage, observability

Nice-to-have

Financial services / regulatory reporting domain Python (Databricks utilities, tooling) Spec-driven development workflows (specs plans tasks implementation)

Apply for this position