Data Engineer
Role details
Job location
Tech stack
Job description
-
A Senior Data Engineer focusing on Python and Hadoop is responsible for designing| building| and maintaining robust data pipelines and infrastructure using the Hadoop ecosystem and advanced Python programming.\r
-
This role involves leading technical projects| ensuring data quality and scalability| and collaborating with cross-functional teams.\r, r\r \r
-
Data Pipeline Development Design| build| and maintain scalable ETL/ ELT processes and data pipelines using Python| SQL| and big data technologies (Hadoop| Spark| Hive| Kafka).\r
-
Big Data Management Work within the Hadoop technology stack| including HDFS| Hive| Yarn| Impala| and HBase| to manage and store large datasets.\r
-
Performance Optimization Automation Troubleshoot| tune| and optimize data processing jobs and database performance| while identifying opportunities for automation in testing and deployment processes (CICD).\r
-
Architecture Design Lead in developing data solutions and designing data service infrastructure| contributing to overall data architecture decisions.\r
-
Collaborate with data scientists| analysts| and business stakeholders to understand data requirements| and provide technical guidance and mentorship to junior team members.\r
-
Data Quality Governance Ensure data accuracy| integrity| and security by implementing validation checks and adhering to data governance standards. \r
Requirements
-
Must Have: Strong proficiency in Python (including Py-Spark) and SQL is essential with additional experience in Java or Scala being a plus. \r, r\r \r
-
Typically requires 5 years of experience in data engineering or a related role| with a proven track record of deploying and managing large-scale distributed systems.\r
\r\r
Programming Languages: \r\r \r
- Strong proficiency in Python (including Py-Spark) and SQL is essential| with additional experience in Java or Scala being a plus.\r
- Big Data Technologies Expertise in the Hadoop ecosystem and components| along with distributed computing frameworks like Apache Spark and Kafka| is crucial.\r
- Databases Cloud Platforms Experience with relational (e.g.| PostgreSQL| MySQL) and NoSQL databases| and familiarity with cloud services (AWS| Google Cloud Platform| or Azure).\r
- Problem-Solving High-level analytical and problem-solving skills to resolve complex technical data issues.\r