Data Engineer

Conch Technologies

Pittsburgh, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Pittsburgh, United States of America

Tech stack

Java

Amazon Web Services (AWS)

Azure

Big Data

Cloud Computing

Cloudera Impala

Databases

Continuous Integration

Data Architecture

Information Engineering

Data Governance

ETL

Data Systems

Distributed Systems

Hadoop

Hadoop Distributed File System

HBase

Hive

Python

PostgreSQL

MySQL

NoSQL

Performance Tuning

Cloud Services

SQL Databases

Technical Data Management Systems

Data Processing

Google Cloud Platform

Apache Yarn

Spark

Database Performance

Kafka

Data Pipelines

Job description

A Senior Data Engineer focusing on Python and Hadoop is responsible for designing| building| and maintaining robust data pipelines and infrastructure using the Hadoop ecosystem and advanced Python programming.\r
This role involves leading technical projects| ensuring data quality and scalability| and collaborating with cross-functional teams.\r, r\r \r
Data Pipeline Development Design| build| and maintain scalable ETL/ ELT processes and data pipelines using Python| SQL| and big data technologies (Hadoop| Spark| Hive| Kafka).\r
Big Data Management Work within the Hadoop technology stack| including HDFS| Hive| Yarn| Impala| and HBase| to manage and store large datasets.\r
Performance Optimization Automation Troubleshoot| tune| and optimize data processing jobs and database performance| while identifying opportunities for automation in testing and deployment processes (CICD).\r
Architecture Design Lead in developing data solutions and designing data service infrastructure| contributing to overall data architecture decisions.\r
Collaborate with data scientists| analysts| and business stakeholders to understand data requirements| and provide technical guidance and mentorship to junior team members.\r
Data Quality Governance Ensure data accuracy| integrity| and security by implementing validation checks and adhering to data governance standards. \r

Requirements

Must Have: Strong proficiency in Python (including Py-Spark) and SQL is essential with additional experience in Java or Scala being a plus. \r, r\r \r
Typically requires 5 years of experience in data engineering or a related role| with a proven track record of deploying and managing large-scale distributed systems.\r

\r\r

Programming Languages: \r\r \r

Strong proficiency in Python (including Py-Spark) and SQL is essential| with additional experience in Java or Scala being a plus.\r
Big Data Technologies Expertise in the Hadoop ecosystem and components| along with distributed computing frameworks like Apache Spark and Kafka| is crucial.\r
Databases Cloud Platforms Experience with relational (e.g.| PostgreSQL| MySQL) and NoSQL databases| and familiarity with cloud services (AWS| Google Cloud Platform| or Azure).\r
Problem-Solving High-level analytical and problem-solving skills to resolve complex technical data issues.\r