Streaming Data

OpenKyber LLC
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Java
Airflow
Amazon Web Services (AWS)
Business Analytics Applications
Apache HTTP Server
Azure
Big Data
Cloud Computing
Computer Programming
Data Architecture
Information Engineering
Data Governance
Data Systems
Hive
Python
Machine Learning
Performance Tuning
Query Optimization
Cloud Services
SQL Databases
Data Streaming
Workflow Management Systems
Google Cloud Platform
Snowflake
Spark
Indexer
Data Lake
Infrastructure Automation Frameworks
Collibra
Kafka
Spark Streaming
Data Lakehouse
Data Pipelines
Databricks

Job description

We are seeking a highly skilled Senior Lead Data Engineer with strong experience in modern data platforms including Snowflake , Databricks , Apache Iceberg , and Apache Spark . The ideal candidate will lead the design, development, and optimization of scalable data pipelines and analytics platforms while ensuring high performance for large-scale SQL workloads . This role requires strong expertise in data architecture, performance tuning, and big data technologies to support enterprise-level analytics and data-driven decision-making., * Design and implement scalable data pipelines and data lakehouse architectures using Snowflake, Databricks, and Apache Iceberg.

  • Lead the development and optimization of Spark-based ETL/ELT pipelines for large-scale data processing.
  • Optimize complex SQL workloads for performance, cost efficiency, and scalability.
  • Build and maintain high-performance data models supporting analytics, reporting, and machine learning workloads.
  • Implement data governance, security, and data quality frameworks.
  • Collaborate with data scientists, analysts, and business stakeholders to deliver reliable data solutions.
  • Perform performance tuning for distributed processing frameworks such as Spark and Databricks.
  • Guide engineering teams on best practices for data architecture, pipeline orchestration, and cloud data platforms .
  • Monitor and troubleshoot data pipeline performance and reliability issues.
  • Mentor junior data engineers and lead technical design discussions.

Requirements

  • 10+ years of experience in Data Engineering or Big Data Engineering .
  • Strong expertise with Snowflake and Databricks Lakehouse platform .
  • Hands-on experience with Apache Spark (PySpark / Spark SQL) .
  • Experience working with Apache Iceberg or modern table formats .
  • Advanced knowledge of SQL performance tuning and query optimization .
  • Experience designing data lake / lakehouse architectures .
  • Strong programming experience in Python, Scala, or Java .
  • Experience with workflow orchestration tools (Airflow, Prefect, or similar).
  • Knowledge of cloud platforms such as Amazon Web Services , Microsoft Azure , or Google Cloud .
  • Strong understanding of data modeling, partitioning, indexing, and storage optimization.

Preferred Qualifications

  • Experience with data lakehouse architecture and open table formats .
  • Knowledge of streaming data pipelines using Kafka or Spark Streaming.
  • Experience with CI/CD pipelines and infrastructure-as-code tools .
  • Strong leadership and mentoring experience.
  • Experience supporting enterprise-scale analytics platforms .

Nice to Have

  • Experience with data governance tools.
  • Knowledge of machine learning data pipelines.
  • Certifications in cloud platforms or data engineering technologies.

Apply for this position