Big Data Engineer
Allruva Technology Services Incorporated
Irving, United States of America
6 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Irving, United States of America
Tech stack
Big Data
Computer Programming
System Configuration
Data Architecture
Data Governance
ETL
Data Security
Hadoop
Hadoop Distributed File System
MapReduce
HBase
Hive
Performance Tuning
Cloudera
SQL Databases
Data Streaming
Unstructured Data
Data Processing
Scripting (Bash/Python/Go/Ruby)
Data Ingestion
System Availability
Spark
PySpark
Information Technology
Kafka
Apache Nifi
Spark Streaming
Data Management
Stream Processing
Data Pipelines
Databricks
Job description
-
Big Data Architecture and Design:
- Design, implement, and maintain scalable and efficient big data solutions using Hadoop ecosystem components.
- Work closely with architects and data scientists to define data architecture and ensure alignment with business requirements.
-
Data Ingestion and Processing:
- Implement data ingestion pipelines for large-scale data processing using tools such as Apache NiFi or custom scripts.
- Develop and optimize PySpark jobs for data processing and transformation.
-
Data Modeling and ETL:
- Create and maintain data models to support efficient querying and reporting.
- Design and implement Extract, Transform, Load (ETL) processes for structured and unstructured data.
-
SQL Database Management:
- Manage and optimize SQL databases for efficient storage and retrieval of structured data.
- Develop and maintain SQL scripts for data manipulation, querying, and reporting.
-
Performance Optimization:
- Optimize PySpark and SQL queries for performance and scalability.
- Identify and resolve performance bottlenecks in big data processing.
-
Data Security:
- Implement security measures for data at rest and in transit within the Hadoop ecosystem.
- Manage access controls and permissions for data stored in Hadoop and SQL databases.
-
Hadoop Cluster Administration:
- Administer and maintain Hadoop clusters, ensuring high availability and reliability.
- Monitor and troubleshoot cluster performance and resource utilization.
-
Streaming Data Processing:
- Implement and optimize real-time data processing using tools such as Apache Kafka and Spark Streaming.
- Develop and maintain streaming data pipelines for continuous data ingestion and analysis.
-
Data Governance and Quality:
- Implement data governance policies and procedures to ensure data quality and integrity.
- Collaborate with data stewards to enforce data quality standards.
-
Documentation:
- Create and maintain comprehensive documentation for data architecture, data flows, and system configurations.
- Document code, scripts, and processes for knowledge sharing and future reference.
Requirements
Do you have experience in Spark implementation?, Do you have a Master's degree?, + Bachelor's or Master's degree in Computer Science, Information Technology, or a related field., + Proficiency in Hadoop ecosystem technologies, including HDFS, MapReduce, Hive, and HBase.
- Strong programming skills in PySpark for data processing and analysis.
- Advanced skills in SQL for data manipulation and querying.
-
Experience:
- Several years of hands-on experience in big data engineering and analytics.
- Experience with end-to-end implementation of big data solutions.
-
Certifications:
- Relevant certifications in Hadoop and Spark technologies (e.g., Cloudera Certified Data Engineer, Databricks Certified Developer).
-
Communication and Collaboration:
- Strong communication skills to interact effectively with technical and non-technical stakeholders.
- Ability to work collaboratively in a team environment.
-
Problem-Solving:
- Strong analytical and problem-solving skills.
- Ability to troubleshoot and optimize complex big data processes.