Databricks Data Engineer

CYBERDASH CRYPTOMETRICS LLC
Aldie, United States of America
7 days ago

Role details

Contract type
Permanent contract
Employment type
Part-time / full-time
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 180K

Job location

Aldie, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Application Frameworks
Audit Trail
Azure
Cloud Database
Cloud Engineering
Continuous Integration
Information Engineering
Data Governance
Data Infrastructure
ETL
Data Security
DevOps
Digital Architecture
Github
Python
Machine Learning
Meta-Data Management
Raw Data
Power BI
Cloud Services
TensorFlow
Standard Sql
Runbook
Software Engineering
SQL Stored Procedures
SQL Databases
SQL Server Reporting Services
SQL Server Integration Services
Management of Software Versions
Model-Driven Development
PyTorch
Spark
Model Validation
Caching
Data Layers
Pandas
Data Lake
PySpark
Information Technology
Data Management
Machine Learning Operations
Software Version Control
Data Pipelines
Key Vault
Databricks

Job description

The Databricks Data Engineer Core AI & Data practice helps organizations modernize data platforms, strengthen enterprise data foundations, and scale analytics and artificial intelligence capabilities across the business. The team works with clients to architect, engineer, and deploy cloud-based data solutions that improve decision-making, enable innovation, and support large-scale transformation. As a hands-on Databricks Data Engineer with deep expertise in Azure/AWS Databricks (AKS/EKS as a backbone) and MLOps, this role will have the opportunity to migrate and translate legacy SSIS ETL logic into scalable, cloud-native data pipelines in Databricks. This role will partner with data engineers, data scientists, and product manager to design features, train/evaluate models, and deploy them to production using MLflow, Databricks and Workflows-with rigorous observability, governance (Unity Catalog), and CI/CD automation., Design, build, and maintain high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark.

  • Convert and modernize existing SSIS package logic into cloud-native Databricks pipelines using PySpark notebooks, Delta Live Tables (DLT), and Databricks Workflows.
  • Implement reliable batch and streaming pipelines with robust data quality and validation frameworks.
  • Optimize pipeline performance using Photon, efficient file formats, partitioning, Z-ordering, and caching strategies.

Lakehouse Platform Development Develop and manage datasets within Delta Lake, ensuring ACID reliability, schema evolution, versioning, and time travel.

  • Architect feature-rich data layers including:
  • Bronze (raw ingestion)
  • Silver (validated, conformed)
  • Gold (analytics-ready and ML-ready)
  • Implement data governance using Unity Catalog for fine-grained access control, lineage, auditability, and metadata management.

MLOps & ML-Enabled Data Pipelines Partner with data scientists and data engineers to create feature pipelines, model training pipelines, and production scoring pipelines.

  • Deploy and operationalize models using MLflow, Databricks Model Registry, and Databricks Workflows.
  • Use Databricks built-in AI SQL functions such as ai_query, ai_forecast, ai_analyze_sentiment to generate actionable insight from large amount of unstructured or structured raw data
  • Implement monitoring for:
  • Pipeline failures
  • Data/feature drift
  • Model performance degradation
  • Operational SLAs/SLIs/SLOs
  • Build automated CI/CD workflows using GitHub Actions or Azure DevOps for notebook deployment, pipeline testing, and environment promotion.

Data Platform, Data Security & Data Governance Collaborate with data engineers to design reliable data products on Delta Lake; leverage Delta Live Tables (DLT) for declarative pipelines when applicable.

  • Enforce Unity Catalog for lineage, permissions, and audit; manage secrets, tokens, and keys securely (e.g., Databricks secrets, Key Vault/Secrets Manager).

Collaboration & Leadership Work closely with cross-functional teams: data engineering, data scientist, product manager, and business stakeholders.

  • Serve as a Databricks SME-championing best practices, code standards, governance, and reusable frameworks.
  • Document architecture, workflows, data models, runbooks, and operational procedures.

Requirements

Do you have experience in Software engineering?, Minimum of 5 years of experience in Databricks, PySpark notebooks, Python, DevOps, software development, and data engineering.

  • Certified Databricks Data Engineer Associate or Professional is a plus.

Skills & Competencies Proficient in designing, building, deploying, and maintaining high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark Notebook.

  • Proficient in building, deploying, and operating production ML models such as supervised, unsupervised, and anomaly detection, including techniques for imbalanced datasets
  • Experience working with EKS/AKS cluster and containerized patform.
  • Proficient with ML engineering and MLOps, including model versioning, CI/CD for ML, monitoring, drift detection, and automated retraining
  • Proficiency in Python including Pandas and PySpark Data frames
  • Expert level of SQL skills including Stored Procedure, experience with SSIS, SSRS, Power BI is a plus.
  • Proficient with cloud data engineering platforms, such as Azure, Databricks, Spark, or SQL, and batch and streaming pipelines
  • Familiar with Databricks AI Built-In Functions such as AI_Query, AI_Gen, AI_Classify, AI_Forecast, AI_Analyze_Sentiment, able to use them to extract actionable insights from large amount of unstructured or structured raw data
  • Experience with Python and ML frameworks, such as PyTorch or TensorFlow
  • Experience in improving data quality, lineage, and observability in enterprise data environments and operationalizing rules and model-driven scoring for prioritization, routing, or case selection
  • Experience with predictive analytics, machine learning, and artificial intelligence desired.
  • Bachelor's degree in Computer Science/ Management Information Systems/ Engineering or related field.

Benefits & conditions

Pulled from the full job description

  • 401(k)
  • Health insurance
  • Vision insurance
  • Dental insurance, * 401(k)
  • Dental insurance
  • Health insurance
  • Vision insurance

Apply for this position