Databricks Data Engineer
Role details
Job location
Tech stack
Job description
Genesis10 is currently seeking a Databricks Data Engineer for a Remote position with a Large Healthcare Client located in Minnetonka, MN. This is a 6+ month contract opportunity., * Support the design, development, testing, and deployment of data analytics programs and processes supporting various operational data stores using SAS and relational databases.
- Collect, interpret, and aggregate data from traditional and non-traditional data sources to support programs and applications utilizing various data analytics purposes.
- Understand data requirements and business needs to develop data tools such as dashboards and data visualizations.
- Use business intelligence, data visualization, query, analytics, and statistical software to build solutions, perform analysis, and interpret data.
- Solve moderately complex problems and translate concepts into practice.
- Recognize problems and make recommendations for solutions.
- Work under minimal guidance and within tight deadlines for deliverables.
- Bring an exploring mindset.
- Effectively interact with business users for new projects, enhancement projects, and issue resolution, including addressing issues reported regarding data and existing applications.
- Adopt a structured approach focused on understanding business user needs, documenting requirements, and clarifying expectations.
- Use active listening and various techniques to gather information while ensuring clear communication to address uncertainties or inconsistencies.
- Create and document high-level design, detailed design, implementation guides, and standard operating procedure guides.
- Perform production support tasks, including job monitoring, addressing production failures, performing data analysis, root cause analysis, and issue resolution.
- Design, develop, and maintain scalable ETL/ELT pipelines using ADF, Python, Apache Spark, and PySpark in Databricks.
- Develop batch and incremental data processing pipelines handling large-scale structured, semi-structured, and unstructured datasets.
- Implement optimized data transformation logic using Spark SQL and DataFrame APIs.
- Ensure pipelines follow enterprise data engineering best practices for performance, scalability, and maintainability.
- Implement reusable ingestion patterns and transformation templates aligned with enterprise architecture standards.
- Ensure compliance with enterprise metadata management, monitoring, and operational standards.
- Design and manage datasets stored in Apache Iceberg and Delta Lake.
- Implement schema evolution, partitioning strategies, and version control for large datasets.
- Optimize data lake storage structures in Azure Data Lake Storage or AWS S3.
- Develop scalable pipelines using Databricks notebooks, jobs, and clusters.
- Manage dataset governance and access controls using Unity Catalog.
- Optimize Spark performance through partitioning, caching, and cluster tuning.
- Develop and schedule ETL pipelines using Apache Airflow.
- Implement dependency management, monitoring, alerting, and failure recovery mechanisms.
- Build pipelines that integrate with Snowflake data warehouse.
- Optimize transformations and data loading using Snowflake SQL and staging techniques.
- Design efficient data models for analytics and reporting.
- Support migration of legacy SAS pipelines to modern Spark-based frameworks and Databricks where applicable.
- Use Unix/Linux commands for common tasks and shell scripting to automate data engineering workflows.
- Support CI/CD deployment processes for ETL pipelines.
- Implement logging, auditing, and monitoring for production pipelines.
- Work with data architects, analysts, and business stakeholders to gather requirements and deliver data solutions.
- Participate in design reviews, architecture discussions, and code reviews.
- Mentor junior data engineers and provide technical guidance.
- Serve as the SME for DBX and provide knowledge training for the team., This role focuses on building and optimizing large-scale data pipelines using ADF, Apache Iceberg, Delta Lake, cloud data lakes such as ADLS or S3, and workflow orchestration tools like Airflow. The analyst will work closely with data architects and platform teams to build reliable and governed data solutions aligned with enterprise standards.
Requirements
The ideal candidate will have strong experience working with Databricks, including Lakehouse, Delta Lake, Workflows, Medallion Architecture, Apache Spark, Unity Catalog, Delta Sharing, Notebooks, SQL, and Git. The candidate should also have strong experience with PySpark, Python, Snowflake, and ADF frameworks., * Bachelor's degree in Computer Science, Computer Applications, Analytics, Data Science, or Information Technology.
- 8+ years of experience in ETL/Data Engineering.
- 8+ years of experience programming with Python.
- 8+ years of experience working in Unix/Linux environments.
- 8+ years of experience writing shell scripts.
- 6+ years of experience with the Databricks ecosystem, including Lakehouse, Delta Lake, Workflows, Medallion Architecture, Apache Spark, PySpark, Unity Catalog, Delta Sharing, Notebooks, SQL, and Git.
- 6+ years of experience with ADF.
- 6+ years of experience working with large enterprise datasets.
- 4+ years of experience with Snowflake.
- Strong analytical and troubleshooting skills.
- Excellent communication and collaboration abilities.
- Ability to work independently and mentor junior analysts.
- Strong documentation and design skills.
- Strong SQL skills.
- Experience implementing governance using Unity Catalog.
- Experience working with Apache Iceberg or other open table formats.
- Experience working with Azure Data Lake Storage or AWS S3.
- Understanding of cloud data lake architecture.
- Hands-on experience with Apache Airflow.
- Experience developing pipelines for Snowflake.
Preferred Skills:
- Experience migrating SAS ETL pipelines to Spark and Databricks.
- Knowledge of data governance frameworks.
- Healthcare experience.
- Strong understanding of SAS programming, SAS Data Step, SAS Macros, and PROC SQL.
Only candidates available and ready to work directly as Genesis10 employees will be considered for this position.
Benefits & conditions
Compensation: $50 - $55 per hour W2, depending on skill and experience level.