TELECOMMUTE PySpark / Java Developer (Data Engineer
Innovative IT Solutions Inc
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
IntermediateJob location
Remote
Tech stack
Java
Unit Testing
Big Data
Health Informatics
Cloudera Impala
Software Quality
Databases
Data Validation
ETL
Data Transformation
Relational Databases
Database Schema
Software Debugging
Hadoop
Hadoop Distributed File System
Hive
Python
Microsoft SQL Server
Apache Oozie
Performance Tuning
Query Optimization
Software Construction
SQL Stored Procedures
SQL Databases
Sqoop
Unstructured Data
Data Processing
Apache Yarn
Database Optimization
Spark
PySpark
Integration Tests
Kafka
Data Pipelines
Sql Tuning
Job description
- Design, develop, and maintain scalable ETL pipelines and data processing applications
- Build and optimize data workflows using PySpark, Java, and Hadoop ecosystem tools
- Analyze business and technical requirements to produce detailed implementation designs
- Perform unit testing, integration testing, and debugging of applications
- Troubleshoot and resolve performance issues related to high-volume data processing
- Develop and maintain SQL queries, stored procedures, and database objects
- Work with structured and unstructured datasets for healthcare analytics
- Generate statistical reports and support data validation processes
- Collaborate with cross-functional teams to ensure end-to-end data pipeline efficiency
- Follow software engineering best practices and maintain code quality standards
Requirements
- Strong experience in ETL development, data processing, and database technologies
- 5+ years of experience with Microsoft SQL Server and relational databases
- Expertise in SQL performance tuning, indexing strategies, and query optimization
- 2+ years of experience with Hadoop ecosystem tools (HDFS, Hive, Impala, Spark, Kafka, Oozie, Yarn, Sqoop, Hue)
- Hands-on experience with PySpark, Python, and/or Java
- Experience working with large-scale data processing frameworks
- Strong understanding of data transformation and data movement technologies
- Ability to handle high-volume structured and unstructured datasets
- Good understanding of end-to-end application/data pipeline lifecycle