Databricks Engineer

CogniSoft Technologies

4 days ago

Role details

Contract type

Temporary to permanent

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Cloud Computing

Continuous Integration

Information Engineering

ETL

Data Masking

DevOps

Python

Meta-Data Management

Performance Tuning

Role-Based Access Control

Standard Sql

SQL Databases

Data Streaming

Data Processing

Data Storage Technologies

Data Ingestion

DevOps Tools - Open-source

Spark

Software Troubleshooting

Data Strategy

GIT

Cloudformation

Data Lake

PySpark

Data Lineage

Kafka

Data Lakehouse

Terraform

Stream Processing

Data Pipelines

Databricks

Job description

We are looking for a hands-on Databricks Engineer with strong AWS experience to design, build, and optimize scalable data pipelines and lakehouse solutions. The role focuses on implementing robust batch and streaming data solutions using Databricks, Delta Lake, and AWS cloud-native services, ensuring high performance, scalability, and security. Key Responsibilities

Build and maintain end-to-end data pipelines using Databricks, Delta Lake, and AWS services
Develop batch, real-time, and streaming data processing workflows
Implement data ingestion, transformation, curation, and storage pipelines
Build and optimize large-scale PySpark and SQL-based jobs in Databricks
Enable real-time data processing using Kafka, AWS Kinesis, or similar streaming tools Data Lakehouse Implementation
Work on Databricks-based lakehouse architecture using Delta Lake
Implement scalable and optimized data storage and processing frameworks
Ensure data quality, consistency, and reliability across pipelines
Support metadata management, data lineage, and governance implementation Cloud & Platform Engineering (AWS)
Work with AWS services such as S3, Glue, Lambda, Kinesis, and Redshift
Ensure pipelines are scalable, secure, and cost-optimized in AWS environments
Implement security controls including RBAC, encryption, and data masking Optimization & Best Practices
Tune Spark jobs for performance and cost efficiency
Monitor and troubleshoot data pipeline issues in production
Follow CI/CD and DevOps practices for deploying data engineering solutions
Ensure adherence to data engineering standards and best practices Collaboration
Work closely with BI teams, and business stakeholders
Support analytics and AI/ML data requirements through curated datasets
Collaborate with architects to ensure alignment with AWS-based data strategy

Technical Leadership & Architecture

Lead the design and implementation of scalable, end-to-end

Requirements

Strong hands-on experience with Databricks.
Proficiency in Python, PySpark, and SQL
Strong experience in AWS cloud services (S3, Glue, Lambda, Kinesis, Redshift)
Experience building ETL/ELT data pipelines
Strong understanding of Delta Lake and lakehouse concepts
Experience with streaming and batch data processing
Knowledge of CI/CD tools and Git
Strong troubleshooting and performance tuning skills Databricks Certified Data Engineer Professional certification is mandatory
IaC (Terraform/CloudFormation)
Data quality & observability frameworks
Deeper Databricks-specific features (DLT, Unity Catalog, Workflows)
Security & compliance depth
DevOps tooling specifics
Leadership/co
Communication expectations