Big Data / Real-Time Data Engineer

NEXXORA INC

Westlake, United States of America

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Westlake, United States of America

Tech stack

Java

Airflow

Amazon Web Services (AWS)

Azure

Big Data

Data as a Services

Data Architecture

Data Governance

Data Integration

ETL

Data Systems

Distributed Systems

Hadoop

Hadoop Distributed File System

Hive

Python

NoSQL

Performance Tuning

SQL Databases

Data Processing

Google Cloud Platform

Cloud Platform System

Spark

Containerization

Data Lake

Kubernetes

Apache Flink

Data Analytics

Kafka

Spark Streaming

Machine Learning Operations

Video Streaming

Stream Processing

Data Pipelines

Docker

Job description

We are seeking a highly skilled Senior Big Data Developer with strong expertise in building real-time data pipelines, advanced analytics, and data science platforms. The ideal candidate will have deep experience with big data ecosystems (including Hadoop-based frameworks) and modern streaming technologies, enabling scalable, high-performance data solutions., * Design, develop, and maintain real-time and batch data pipelines

Work with large-scale distributed systems using Big Data frameworks (e.g., Hadoop ecosystem)
Build and optimize data processing solutions using streaming technologies such as Kafka, Spark Streaming, or Flink
Collaborate with data scientists and analysts to enable advanced analytics and machine learning workflows
Develop and maintain data models, ETL/ELT pipelines, and data integration solutions
Ensure data quality, performance optimization, and scalability
Work with cloud-based data platforms (AWS, Azure, or Google Cloud Platform)
Implement best practices for data governance, security, and compliance

Requirements

Strong experience with Big Data frameworks (Hadoop, Spark, Hive, HDFS)
Hands-on expertise in real-time data processing (Kafka, Spark Streaming, Flink, etc.)
Proficiency in programming languages such as Python, Java, or Scala
Experience with data analytics and data science platforms
Solid understanding of distributed computing and data architecture
Experience with SQL and NoSQL databases
Familiarity with ETL tools and data pipeline orchestration (Airflow, etc.), * Experience with cloud platforms (AWS, Azure, Google Cloud Platform) and their data services
Knowledge of machine learning workflows and MLOps
Experience with containerization (Docker, Kubernetes)
Exposure to Data Lake / Lakehouse architectures
Strong problem-solving and communication skills