Data Engineer (Python/PySpark/AWS)
Role details
Job location
Tech stack
Job description
We are seeking a highly skilled and motivated Data Engineer to join our dynamic team. As a Data Engineer, you will play a crucial role in designing, developing, and maintaining our clients data infrastructure. Your expertise in Python, PySpark, ETL processes, CI/CD (Jenkins or GitHub), and experience with both streaming and batch workflows will be essential in ensuring the efficient flow and processing of data to support our clients.
Responsibilities:
Data Architecture and Design:
-
Collaborate with cross-functional teams to understand data requirements and design robust data architecture solutions.
-
Develop data models and schema designs to optimize data storage and retrieval.
ETL Development:
-
Implement ETL processes to extract, transform, and load data from various sources.
-
Ensure data quality, integrity, and consistency throughout the ETL pipeline.
Python and PySpark Development:
-
Utilize your expertise in Python and PySpark to develop efficient data processing and analysis scripts.
-
Optimize code for performance and scalability, keeping up-to-date with the latest industry best practices.
Data Integration:
-
Integrate data from different systems and sources to provide a unified view for analytical purposes.
-
Collaborate with data scientists and analysts to implement solutions that meet their data integration needs.
Streaming and Batch Workflows:
-
Design and implement streaming workflows using PySpark Streaming or other relevant technologies.
-
Develop batch processing workflows for large-scale data processing and analysis.
CI/CD Implementation:
-
Implement and maintain continuous integration and continuous deployment (CI/CD) pipelines using Jenkins or GitHub Actions.
-
Automate testing, code deployment, and monitoring processes to ensure the reliability of data pipelines.
Requirements
-
Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
-
Proven experience as a Data Engineer or similar role.
-
Strong programming skills in Python and expertise in PySpark for both batch and streaming data processing.
-
Hands-on experience with ETL tools and processes.
-
Familiarity with CI/CD tools such as Jenkins or GitHub Actions.
-
Solid understanding of data modeling, database design, and data warehousing concepts.
-
Excellent problem-solving and analytical skills.
-
Strong communication and collaboration skills.
Preferred Skills:
-
Knowledge of cloud platforms such as AWS, Azure, or Google Cloud.
-
Experience with version control systems (e.g., Git).
-
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
-
Understanding of data security and privacy best practices.
Applicants for employment in the U.S. must possess work authorization which does not require sponsorship by the employer for a visa. Infinitive is an Equal Opportunity Employer.