Data Engineer - Hadoop OzoneCH
Mphasis
Fanwood, United States of America
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Fanwood, United States of America
Tech stack
Java
Apache HTTP Server
Big Data
Business Process Modeling
Cloud Computing
Data Validation
Information Engineering
ETL
Data Transformation
Data Security
Data Systems
Distributed Computing Environment
Fault Tolerance
Hadoop
Hadoop Distributed File System
MapReduce
HBase
Hive
Python
Shell Script
Software Engineering
SQL Databases
Data Streaming
Workflow Management Systems
Enterprise Software Applications
Apache Yarn
Spark
Documentation System
Containerization
Kubernetes
Information Technology
Apache Flink
Real Time Data
Kafka
Docker
Job description
We are seeking a highly skilled Big Data Engineer with strong experience in Apache Spark, Hadoop ecosystem, and Apache Ozone. The ideal candidate will design, develop, and optimize large-scale data processing systems, ensuring high performance, scalability, and reliability for enterprise-level applications., * Design and implement distributed data processing solutions using Apache Spark, Hadoop, Flink
- Develop and maintain Spark applications for data transformation, aggregation, and ETL processes using Scala, Java, or Python
- Utilize Apache Ozone for storing large-scale datasets, ensuring efficient data access and management in a distributed environment
- Manage and optimize HDFS and Apache Ozone, Kafka for scalable and fault-tolerant storage.
- Develop ETL pipelines for batch and real-time data ingestion and transformation.
- Implement and ensure data validation, data security, integrity, and compliance across big data platforms.
- Monitor and troubleshoot performance issues in large-scale clusters.
- Collaborate with data scientists, analysts, and application teams to deliver high-quality data solutions.
- Automate workflows and improve operational efficiency using scripting and orchestration tools.
Requirements
- Strong expertise in Apache Spark (Core, SQL, Streaming).
- Hands-on experience with Hadoop ecosystem (HDFS, YARN, MapReduce).
- Proficiency in Apache Ozone for object storage and integration with Hadoop.
- Solid programming skills in Java , Scala , or Python.
- Experience with Hive, HBase , and Kafkais a plus.
- Knowledge of cluster management and resource optimization.
- Familiarity with Linux/Unix environments and shell scripting.
- Understanding of data security, governance, and compliance standards.
- Experience with cloud-based big data platforms
- Exposure to containerization (Docker, Kubernetes) for big data workloads.
- Knowledge of CI/CD pipelines for data engineering projects.
Behavioral Skills:
- Good Communication skills
- 5 days Work from Office at Berkley Heights, NJ
- Team Player
- Ability to work in a changing environment
- Strong problem solving and analytical skills
- Ability to work independently or within a team
- Manage day-to-day challenges and communicate developmental risks with the technical team
Qualifications:
- Bachelor's degree in computer science, Software Engineering, or a related field.
- Proficiency in business process modeling and documentation tools.
- Product implementation experience is preferred