Streaming Data

OpenKyber LLC

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Tech stack

Java

Airflow

Amazon Web Services (AWS)

Business Analytics Applications

Apache HTTP Server

Azure

Big Data

Cloud Computing

Computer Programming

Data Architecture

Information Engineering

Data Governance

Data Systems

Hive

Python

Machine Learning

Performance Tuning

Query Optimization

Cloud Services

SQL Databases

Data Streaming

Workflow Management Systems

Google Cloud Platform

Snowflake

Spark

Indexer

Data Lake

Infrastructure Automation Frameworks

Collibra

Kafka

Spark Streaming

Data Lakehouse

Data Pipelines

Databricks

Job description

We are seeking a highly skilled Senior Lead Data Engineer with strong experience in modern data platforms including Snowflake , Databricks , Apache Iceberg , and Apache Spark . The ideal candidate will lead the design, development, and optimization of scalable data pipelines and analytics platforms while ensuring high performance for large-scale SQL workloads . This role requires strong expertise in data architecture, performance tuning, and big data technologies to support enterprise-level analytics and data-driven decision-making., * Design and implement scalable data pipelines and data lakehouse architectures using Snowflake, Databricks, and Apache Iceberg.

Lead the development and optimization of Spark-based ETL/ELT pipelines for large-scale data processing.
Optimize complex SQL workloads for performance, cost efficiency, and scalability.
Build and maintain high-performance data models supporting analytics, reporting, and machine learning workloads.
Implement data governance, security, and data quality frameworks.
Collaborate with data scientists, analysts, and business stakeholders to deliver reliable data solutions.
Perform performance tuning for distributed processing frameworks such as Spark and Databricks.
Guide engineering teams on best practices for data architecture, pipeline orchestration, and cloud data platforms .
Monitor and troubleshoot data pipeline performance and reliability issues.
Mentor junior data engineers and lead technical design discussions.

Requirements

10+ years of experience in Data Engineering or Big Data Engineering .
Strong expertise with Snowflake and Databricks Lakehouse platform .
Hands-on experience with Apache Spark (PySpark / Spark SQL) .
Experience working with Apache Iceberg or modern table formats .
Advanced knowledge of SQL performance tuning and query optimization .
Experience designing data lake / lakehouse architectures .
Strong programming experience in Python, Scala, or Java .
Experience with workflow orchestration tools (Airflow, Prefect, or similar).
Knowledge of cloud platforms such as Amazon Web Services , Microsoft Azure , or Google Cloud .
Strong understanding of data modeling, partitioning, indexing, and storage optimization.

Preferred Qualifications

Experience with data lakehouse architecture and open table formats .
Knowledge of streaming data pipelines using Kafka or Spark Streaming.
Experience with CI/CD pipelines and infrastructure-as-code tools .
Strong leadership and mentoring experience.
Experience supporting enterprise-scale analytics platforms .

Nice to Have

Experience with data governance tools.
Knowledge of machine learning data pipelines.
Certifications in cloud platforms or data engineering technologies.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all