Data/Scala/Spark Engineering Specialist
Role details
Job location
Tech stack
Job description
We're migrating complex on-prem regulatory reporting pipelines from a legacy ETL + Autosys + SQL + Teradata stack to a modern Databricks + Snowflake platform on Azure. The role is hands-on: design, implement, test, and reconcile production pipelines feeding regulatory reports under strict parity requirements.
Requirements
Scala / Spark production experience writing Spark applications in Scala (not just notebooks); comfortable with the Data Frame API, joins, window functions, partitioning, and performance tuning Databricks Serverless compute, Unity Catalog, Asset Bundles, Databricks CLI SQL fluency comfortable writing, analyzing and extracting requirements from complex SQL scripts Snowflake schema design, performance, Spark-Snowflake connector Azure ADLS, networking basics, secrets/identity (Entra ID / managed identities) Orchestration Airflow (DAG authoring, sensors, retries, SLAs) CI/CD Artifactory, GitHub Actions pipelines: build, sharded test matrices, artifact promotion through dev QA UAT prod Testing Experience in TDD, writing unit tests (ScalaTest, AnyFlatSpec) and BDD (Concordion or equivalent) Data quality & reconciliation building automated parity checks against legacy outputs, drift detection, row-level reconciliation tooling Large-scale migrations proven track record migrating legacy ETL (Autosys/Informatica/etc.) to cloud data platforms, including dependency mapping and cutover planning Modern data engineering practices medallion architecture (Bronze/Silver/Gold), idempotent pipelines, schema evolution, lineage, observability
Nice-to-have
Financial services / regulatory reporting domain Python (Databricks utilities, tooling) Spec-driven development workflows (specs plans tasks implementation)