Databricks Data Engineer

W. R. Berkley Corporation

Manassas, United States of America

3 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Manassas, United States of America

Tech stack

Artificial Intelligence

Azure

Continuous Integration

Information Engineering

Data Governance

ETL

Data Security

DevOps

Github

Python

Machine Learning

Meta-Data Management

Raw Data

Power BI

TensorFlow

Software Engineering

SQL Stored Procedures

SQL Databases

SQL Server Reporting Services

SQL Server Integration Services

Management of Software Versions

Model-Driven Development

PyTorch

Spark

Data Layers

Pandas

Data Lake

PySpark

Information Technology

Machine Learning Operations

Data Pipelines

Databricks

Job description

This position requires on-site work Monday-Thursday at either our Manassas, VA or Chesterfield, MO location., The Databricks Data Engineer will help design, build, deploy, and maintain scalable and production grade data pipelines in modern cloud environments, enabling analytics, AI, ML, and decision advantage at scale. This role will work with cutting-edge tools like Databricks, Delta Lake, PySpark, and AI/BI genie to transform raw data into actionable insights. As a hands-on Databricks Data Engineer with deep expertise in Azure Databricks and MLOps, this role will have the opportunity to migrate and translate legacy SSIS ETL logic into scalable, cloud-native data pipelines in Databricks. This role will partner with data engineers, data scientists, and product manager to design features, train/evaluate models, and deploy them to production using MLflow, Databricks and Workflows-with rigorous observability, governance (Unity Catalog), and CI/CD automation., * Design, build, and maintain high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark.

Convert and modernize existing SSIS package logic into cloud-native Databricks pipelines using PySpark notebooks, Delta Live Tables (DLT), and Databricks Workflows.
Implement reliable batch and streaming pipelines with robust data quality and validation frameworks.
Optimize pipeline performance using Photon, efficient file formats, partitioning, Z-ordering, and caching strategies.

Lakehouse Platform Development

Develop and manage datasets within Delta Lake, ensuring ACID reliability, schema evolution, versioning, and time travel.
Architect feature-rich data layers including:

Bronze (raw ingestion)
Silver (validated, conformed)
Gold (analytics-ready and ML-ready)

Implement data governance using Unity Catalog for fine-grained access control, lineage, auditability, and metadata management.

MLOps & ML-Enabled Data Pipelines

Partner with data scientists and data engineers to create feature pipelines, model training pipelines, and production scoring pipelines.
Deploy and operationalize models using MLflow, Databricks Model Registry, and Databricks Workflows.
Use Databricks built-in AI SQL functions such as ai_query, ai_forecast, ai_analyze_sentiment to generate actionable insight from large amount of unstructured or structured raw data
Implement monitoring for:

Pipeline failures
Data/feature drift
Model performance degradation
Operational SLAs/SLIs/SLOs

Build automated CI/CD workflows using GitHub Actions or Azure DevOps for notebook deployment, pipeline testing, and environment promotion.

Data Platform, Data Security & Data Governance

Collaborate with data engineers to design reliable data products on Delta Lake; leverage Delta Live Tables (DLT) for declarative pipelines when applicable.
Enforce Unity Catalog for lineage, permissions, and audit; manage secrets, tokens, and keys securely (e.g., Databricks secrets, Key Vault/Secrets Manager).

Collaboration & Leadership

Work closely with cross-functional teams: data engineering, data scientist, product manager, and business stakeholders.
Serve as a Databricks SME-championing best practices, code standards, governance, and reusable frameworks.
Document architecture, workflows, data models, runbooks, and operational procedures.

Requirements

Minimum of 3 years of experience in Databricks, PySpark notebooks, Python, DevOps, software development, and data engineering.
Certified Databricks Data Engineer Associate or Professional is a plus.

Skills & Competencies

Proficient in designing, building, deploying, and maintaining high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark Notebook.
Proficient in building, deploying, and operating production ML models such as supervised, unsupervised, and anomaly detection, including techniques for imbalanced datasets
Proficient with ML engineering and MLOps, including model versioning, CI/CD for ML, monitoring, drift detection, and automated retraining
Proficiency in Python including Pandas and PySpark Dataframes
Expert level of SQL skills including Stored Procedure, experience with SSIS, SSRS, Power BI is a plus.
Proficient with cloud data engineering platforms, such as Azure, Databricks, Spark, or SQL, and batch and streaming pipelines
Familiar with Databricks AI Built-In Functions such as AI_Query, AI_Gen, AI_Classify, AI_Forecast, AI_Analyze_Sentiment, able to use them to extract actionable insights from large amount of unstructured or structured raw data
Experience with Python and ML frameworks, such as PyTorch or TensorFlow
Experience improving data quality, lineage, and observability in enterprise data environments and operationalizing rules and model-driven scoring for prioritization, routing, or case selection
Experience with predictive analytics, machine learning and artificial intelligence desired.

Education

A Bachelor's degree in Computer Science, Management Information Systems, Engineering, Math, Physics, or a related quantitative field is required (4-year degree). Master's degree preferred
Experience in the commercial insurance industry is a plus., * Ability to travel locally and nationally up to 5% of the time

Benefits & conditions

The company offers a competitive compensation plan and robust benefits package for full time regular employees. Base salary & Benefits include Health, dental, vision, life, disability, wellness, paid time off, 401(k) and profit-sharing plans.

About the company

We started in early 2019 as a small group of technologists with a passion for making insurance better. Today we are working with a team of industry experts who run five different insurance brands and collectively control $1 billion in annual premiums. We believe in an idea and execution meritocracy. In other words, a place where the best ideas win and the people who deliver the most value get the most opportunities. As we grow our team, we are looking for inquisitive, entrepreneurial people who are excited to reimagine the insurance industry.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all