Data Engineer

Conch Technologies
Pittsburgh, United States of America
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Pittsburgh, United States of America

Tech stack

Java
Amazon Web Services (AWS)
Azure
Big Data
Cloud Computing
Cloudera Impala
Databases
Continuous Integration
Data Architecture
Information Engineering
Data Governance
ETL
Data Systems
Distributed Systems
Hadoop
Hadoop Distributed File System
HBase
Hive
Python
PostgreSQL
MySQL
NoSQL
Performance Tuning
Cloud Services
SQL Databases
Technical Data Management Systems
Data Processing
Google Cloud Platform
Apache Yarn
Spark
Database Performance
Kafka
Data Pipelines

Job description

  • A Senior Data Engineer focusing on Python and Hadoop is responsible for designing| building| and maintaining robust data pipelines and infrastructure using the Hadoop ecosystem and advanced Python programming.\r

  • This role involves leading technical projects| ensuring data quality and scalability| and collaborating with cross-functional teams.\r, r\r \r

  • Data Pipeline Development Design| build| and maintain scalable ETL/ ELT processes and data pipelines using Python| SQL| and big data technologies (Hadoop| Spark| Hive| Kafka).\r

  • Big Data Management Work within the Hadoop technology stack| including HDFS| Hive| Yarn| Impala| and HBase| to manage and store large datasets.\r

  • Performance Optimization Automation Troubleshoot| tune| and optimize data processing jobs and database performance| while identifying opportunities for automation in testing and deployment processes (CICD).\r

  • Architecture Design Lead in developing data solutions and designing data service infrastructure| contributing to overall data architecture decisions.\r

  • Collaborate with data scientists| analysts| and business stakeholders to understand data requirements| and provide technical guidance and mentorship to junior team members.\r

  • Data Quality Governance Ensure data accuracy| integrity| and security by implementing validation checks and adhering to data governance standards. \r

Requirements

  • Must Have: Strong proficiency in Python (including Py-Spark) and SQL is essential with additional experience in Java or Scala being a plus. \r, r\r \r

  • Typically requires 5 years of experience in data engineering or a related role| with a proven track record of deploying and managing large-scale distributed systems.\r

\r\r

Programming Languages: \r\r \r

  • Strong proficiency in Python (including Py-Spark) and SQL is essential| with additional experience in Java or Scala being a plus.\r
  • Big Data Technologies Expertise in the Hadoop ecosystem and components| along with distributed computing frameworks like Apache Spark and Kafka| is crucial.\r
  • Databases Cloud Platforms Experience with relational (e.g.| PostgreSQL| MySQL) and NoSQL databases| and familiarity with cloud services (AWS| Google Cloud Platform| or Azure).\r
  • Problem-Solving High-level analytical and problem-solving skills to resolve complex technical data issues.\r

Apply for this position