Data Bricks

Covetus, LLC
Fort Worth, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Fort Worth, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Big Data
Data Infrastructure
ETL
Data Systems
Distributed Computing Environment
Distributed Systems
Hadoop
Python
Machine Learning
Performance Tuning
Cloudera
SAS (Software)
Data Storage Management
Large Language Models
Spark
Data Lake
PySpark
Data Management
Machine Learning Operations
Data Pipelines
Amazon Web Services (AWS)
Databricks
Data Generation

Job description

As a Databricks Data Engineer, you will be responsible for designing, developing, and maintaining data solutions for data generation, collection, and processing in Big Data environment using predominantly PySpark/Python. Your typical day will involve creating data pipelines, ensuring data quality, and implementing ETL processes to migrate and deploy data across systems using PySpark.

Roles & Responsibilities:

  • Collaborate closely with data scientists, data engineers, and business stakeholders to gather requirements and understand the business objectives driving data pipeline development.
  • Design, develop, and maintain robust, scalable high-performance Data Pipelines using Databricks.
  • Leverage Databricks features such as Lakehouse and Delta Lake for efficient data storage and Spark for distributed processing
  • Develop ETL/ELT pipeline using Databricks
  • Monitor pipeline health, troubleshoot data issues
  • Migrate on Prem Pyspark, SAS data pipeline and ML Models to Databricks
  • Define and implement best practices in Databricks
  • Evaluate new Databricks features and tools, helping the organization stay at the forefront of innovation in AI and Big Data
  • Collaborate with cross-functional teams to identify and resolve data-related issues.

Requirements

  • Proven expertise in implementing Lakehouse and Delta Lake using Databricks.
  • Strong PySpark and Python experience
  • Databricks Certified Data Engineer Professional Certification
  • Familiarity with ML Ops/LLM Ops and distributed systems.
  • Experience with Big Data platform like Cloudera Hadoop and Could platforms like AWS, GCP.
  • Solid understanding of system design patterns, scalability, observability, and performance tuning.
  • Strong analytical and problem-solving skills.
  • Passion for exploring and building with emerging technologies.

Good to Have Skills:

  • AWS EKS Experience, Dockers and Containers

Apply for this position