Data Engineer

Incedo Inc
San Francisco, United States of America
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

San Francisco, United States of America

Tech stack

Unity
API
Amazon Web Services (AWS)
Confluence
JIRA
Audit Trail
Azure
Clinical Data Repository
Software Quality
Information Systems
Continuous Integration
Information Engineering
Data Governance
Data Infrastructure
Dimensional Modeling
Github
Python
Operational Databases
DataOps
Search Technologies
SQL Databases
Cloud Platform System
Spark
Build Management
Data Lake
PySpark
Information Technology
Data Lineage
Machine Learning Operations
Veeva
Data Pipelines
Databricks

Job description

We are seeking an experienced Senior Data Engineer with deep expertise in Databricks, PySpark, and cloud data platform delivery. The role involves owning the design and build of production data pipelines on Databricks, leading technical decisions on lakehouse architecture, and working directly with client stakeholders to translate business requirements into working data products.

This is a hands-on, client-facing position based in the US (SSF). You will serve as the technical lead for a life sciences data platform build, guiding architecture, mentoring offshore engineers, and owning delivery quality across the engagement.

Role and Responsibilities

  • Design and build production data pipelines on Databricks using PySpark, Python, and SQL across Bronze, Silver, and Gold medallion layers.
  • Define lakehouse architecture patterns including Delta Lake table design, partitioning strategy, and compute optimization using Spark/Photon.
  • Configure and manage Unity Catalog for data governance, access control, lineage tracking, and audit logging.
  • Build and maintain ingestion frameworks using Databricks Auto Loader, Lakeflow Connect, and batch/API connectors.
  • Implement data quality checks, validation rules, and monitoring as part of every pipeline deployment.
  • Set up CI/CD pipelines for Databricks notebooks and jobs using GitHub, Databricks Repos, and Lakeflow Jobs.
  • Own technical communication with client stakeholders: architecture walkthroughs, design reviews, sprint demos.
  • Mentor offshore data engineers on Databricks best practices, code quality, and pipeline design standards.

Requirements

  • 7+ years of hands-on data engineering experience.
  • 2+ years working on Databricks (notebooks, workflows, Delta Lake, Unity Catalog).
  • 2+ years working with PySpark or Python in a pipeline development context.
  • Experience designing data models for analytical and BI workloads (dimensional models, SCD patterns, medallion architecture).
  • Solid understanding of CI/CD practices for data pipeline deployments (GitHub, Databricks Repos).
  • Experience operating in client-facing, consulting, or services delivery environments.
  • Strong communication skills. Comfortable presenting architecture decisions to both technical and business stakeholders.
  • Hands-on experience with Jira and Confluence.
  • Strong understanding of Agile / Scrum methodologies.

Good to Have

  • Life sciences, pharma, or biotech domain experience.
  • Experience with Veeva systems, clinical data, or regulatory data environments.
  • Exposure to data observability tools (Monte Carlo, Databricks Lakehouse Monitoring).
  • Experience with Vector Search, ML Runtime, or MLFlow on Databricks.
  • AWS or Azure cloud platform experience alongside Databricks., * A bachelor's degree in Computer Science, Information Systems, Engineering, or a related field. A master's degree may be preferred but is not required.

Apply for this position