Data Engineer

Incedo Inc

San Francisco, United States of America

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

San Francisco, United States of America

Tech stack

Unity

API

Amazon Web Services (AWS)

Confluence

JIRA

Audit Trail

Azure

Clinical Data Repository

Software Quality

Information Systems

Continuous Integration

Information Engineering

Data Governance

Data Infrastructure

Dimensional Modeling

Github

Python

Operational Databases

DataOps

Search Technologies

SQL Databases

Cloud Platform System

Spark

Build Management

Data Lake

PySpark

Information Technology

Data Lineage

Machine Learning Operations

Veeva

Data Pipelines

Databricks

Job description

We are seeking an experienced Senior Data Engineer with deep expertise in Databricks, PySpark, and cloud data platform delivery. The role involves owning the design and build of production data pipelines on Databricks, leading technical decisions on lakehouse architecture, and working directly with client stakeholders to translate business requirements into working data products.

This is a hands-on, client-facing position based in the US (SSF). You will serve as the technical lead for a life sciences data platform build, guiding architecture, mentoring offshore engineers, and owning delivery quality across the engagement.

Role and Responsibilities

Design and build production data pipelines on Databricks using PySpark, Python, and SQL across Bronze, Silver, and Gold medallion layers.
Define lakehouse architecture patterns including Delta Lake table design, partitioning strategy, and compute optimization using Spark/Photon.
Configure and manage Unity Catalog for data governance, access control, lineage tracking, and audit logging.
Build and maintain ingestion frameworks using Databricks Auto Loader, Lakeflow Connect, and batch/API connectors.
Implement data quality checks, validation rules, and monitoring as part of every pipeline deployment.
Set up CI/CD pipelines for Databricks notebooks and jobs using GitHub, Databricks Repos, and Lakeflow Jobs.
Own technical communication with client stakeholders: architecture walkthroughs, design reviews, sprint demos.
Mentor offshore data engineers on Databricks best practices, code quality, and pipeline design standards.

Requirements

7+ years of hands-on data engineering experience.
2+ years working on Databricks (notebooks, workflows, Delta Lake, Unity Catalog).
2+ years working with PySpark or Python in a pipeline development context.
Experience designing data models for analytical and BI workloads (dimensional models, SCD patterns, medallion architecture).
Solid understanding of CI/CD practices for data pipeline deployments (GitHub, Databricks Repos).
Experience operating in client-facing, consulting, or services delivery environments.
Strong communication skills. Comfortable presenting architecture decisions to both technical and business stakeholders.
Hands-on experience with Jira and Confluence.
Strong understanding of Agile / Scrum methodologies.

Good to Have

Life sciences, pharma, or biotech domain experience.
Experience with Veeva systems, clinical data, or regulatory data environments.
Exposure to data observability tools (Monte Carlo, Databricks Lakehouse Monitoring).
Experience with Vector Search, ML Runtime, or MLFlow on Databricks.
AWS or Azure cloud platform experience alongside Databricks., * A bachelor's degree in Computer Science, Information Systems, Engineering, or a related field. A master's degree may be preferred but is not required.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all