Databricks Data Engineer

Tata Consultancy Services Limited

Irving, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

$ 140K

Job location

Irving, United States of America

Tech stack

Unity

Airflow

Amazon Web Services (AWS)

Apache HTTP Server

Azure

Big Data

Cloud Computing

Code Review

Continuous Integration

Data Governance

ETL

Data Masking

Data Transformation

Data Systems

Data Warehousing

Database Queries

DevOps

Distributed Systems

Github

Hive

Python

Machine Learning

Performance Tuning

Shell Script

Software Engineering

SQL Databases

Data Streaming

Workflow Management Systems

Data Processing

Scripting (Bash/Python/Go/Ruby)

Google Cloud Platform

Data Storage Technologies

Cloud Platform System

Spark

Gitlab

GIT

Containerization

Data Lake

PySpark

Kubernetes

Information Technology

Machine Learning Operations

Software Version Control

Data Pipelines

Databricks

Programming Languages

Job description

We are seeking a highly skilled and motivated Databricks Certified Engineer to design, build, and optimize scalable data pipelines and ETL workflows using the Databricks Data Intelligence Platform. The ideal candidate will be responsible for writing robust Python and Spark code, ensuring data quality, and implementing data governance across cloud environments (AWS, Azure, or GCP). This role requires expertise in large-scale data processing, data warehousing principles, and cloud-native solutions. Roles & Responsibilities:

Pipeline Development: Design, build, and maintain scalable ETL/ELT data pipelines using PySpark, Delta Lake, Auto Loader, and Databricks Workflows.
Data Transformation & Processing: Design and process batch and streaming data to support the Medallion Architecture (Bronze, Silver, Gold layers).
Data Governance & Security: Implement access controls and data masking policies using Unity Catalog to secure Personally Identifiable Information (PII) and ensure compliance.
Performance Tuning: Optimize Spark jobs, troubleshoot memory bottlenecks, and adjust cluster configurations for cost and compute efficiency.
Proactive Risk Identification: Proactively identify and address underlying data complexities, hidden challenges, and potential risks within data pipelines and the Databricks ecosystem, ensuring robust, secure, and efficient data solutions.
Cross-Functional Collaboration: Partner with Data Scientists and Analysts to curate datasets, support machine learning models (MLflow), and provide integrated reporting.
Develop and maintain comprehensive documentation for data pipelines, data models, and ETL processes.
Participate in code reviews to maintain high-quality code standards.
Troubleshoot and resolve issues in data pipelines and Databricks clusters.

Requirements

Do you have experience in Spark?, Do you have a Bachelor's degree?, * Primary Skill Set:

o Databricks Platform Expertise: In-depth knowledge of the Databricks Data Intelligence Platform, including notebooks, Delta Lake, MLflow, Unity Catalog, Auto Loader, and Databricks Workflows. o Databricks Certification: Relevant Databricks certification (Associate or Professional level) validating foundational or advanced skills in the platform.

Secondary Skill Set:

o PySpark: Strong proficiency in developing complex data transformations and analytics using PySpark. o Apache Iceberg: Experience with Apache Iceberg for open table format management.

Programming Languages:

o Python: Expert-level proficiency in Python for data manipulation, scripting, and application development. o SQL: Advanced proficiency in SQL for data querying and manipulation. o Shell Scripting: Experience with shell scripting for automation and job orchestration.

Cloud Platforms: Hands-on experience with Databricks deployed on major cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).
Big Data Concepts: Deep understanding of distributed computing, data warehousing principles, ETL/ELT processes, and data modeling.

Good to Have Skills

DevOps Basics: Familiarity with CI/CD tools (e.g., Databricks Asset Bundles, GitHub Actions, GitLab) and orchestration tools like Apache Airflow.
Data Warehousing: Knowledge of Hive for data storage and querying.
Container Orchestration: Familiarity with Kubernetes for deploying and managing containerized applications.
Version Control: Experience with Git or other version control systems.

Databricks Certification Levels Depending on seniority, candidates may possess different levels of Databricks credentials:

Associate Level: Validates foundational skills in writing Spark code, building SQL queries, and utilizing the Databricks workspace.
Professional Level: Validates advanced skills for production environments, focusing on complex streaming workloads, CI/CD, data governance (Unity Catalog), and high-level performance optimization., Qualifications : BACHELOR OF COMPUTER SCIENCE

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all