Big Data Engineer

Allruva Technology Services Incorporated

Irving, United States of America

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Irving, United States of America

Tech stack

Big Data

Computer Programming

System Configuration

Data Architecture

Data Governance

ETL

Data Security

Hadoop

Hadoop Distributed File System

MapReduce

HBase

Hive

Performance Tuning

Cloudera

SQL Databases

Data Streaming

Unstructured Data

Data Processing

Scripting (Bash/Python/Go/Ruby)

Data Ingestion

System Availability

Spark

PySpark

Information Technology

Kafka

Apache Nifi

Spark Streaming

Data Management

Stream Processing

Data Pipelines

Databricks

Job description

Big Data Architecture and Design:

Design, implement, and maintain scalable and efficient big data solutions using Hadoop ecosystem components.
Work closely with architects and data scientists to define data architecture and ensure alignment with business requirements.

Data Ingestion and Processing:

Implement data ingestion pipelines for large-scale data processing using tools such as Apache NiFi or custom scripts.
Develop and optimize PySpark jobs for data processing and transformation.

Data Modeling and ETL:

Create and maintain data models to support efficient querying and reporting.
Design and implement Extract, Transform, Load (ETL) processes for structured and unstructured data.

SQL Database Management:

Manage and optimize SQL databases for efficient storage and retrieval of structured data.
Develop and maintain SQL scripts for data manipulation, querying, and reporting.

Performance Optimization:

Optimize PySpark and SQL queries for performance and scalability.
Identify and resolve performance bottlenecks in big data processing.

Data Security:

Implement security measures for data at rest and in transit within the Hadoop ecosystem.
Manage access controls and permissions for data stored in Hadoop and SQL databases.

Hadoop Cluster Administration:

Administer and maintain Hadoop clusters, ensuring high availability and reliability.
Monitor and troubleshoot cluster performance and resource utilization.

Streaming Data Processing:

Implement and optimize real-time data processing using tools such as Apache Kafka and Spark Streaming.
Develop and maintain streaming data pipelines for continuous data ingestion and analysis.

Data Governance and Quality:

Implement data governance policies and procedures to ensure data quality and integrity.
Collaborate with data stewards to enforce data quality standards.

Documentation:

Create and maintain comprehensive documentation for data architecture, data flows, and system configurations.
Document code, scripts, and processes for knowledge sharing and future reference.

Requirements

Do you have experience in Spark implementation?, Do you have a Master's degree?, + Bachelor's or Master's degree in Computer Science, Information Technology, or a related field., + Proficiency in Hadoop ecosystem technologies, including HDFS, MapReduce, Hive, and HBase.

Strong programming skills in PySpark for data processing and analysis.
Advanced skills in SQL for data manipulation and querying.

Experience:

Several years of hands-on experience in big data engineering and analytics.
Experience with end-to-end implementation of big data solutions.

Certifications:

Relevant certifications in Hadoop and Spark technologies (e.g., Cloudera Certified Data Engineer, Databricks Certified Developer).

Communication and Collaboration:

Strong communication skills to interact effectively with technical and non-technical stakeholders.
Ability to work collaboratively in a team environment.

Problem-Solving:

Strong analytical and problem-solving skills.
Ability to troubleshoot and optimize complex big data processes.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all