TELECOMMUTE PySpark / Java Developer (Data Engineer

Innovative IT Solutions Inc

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Tech stack

Java

Unit Testing

Big Data

Health Informatics

Cloudera Impala

Software Quality

Databases

Data Validation

ETL

Data Transformation

Relational Databases

Database Schema

Software Debugging

Hadoop

Hadoop Distributed File System

Hive

Python

Microsoft SQL Server

Apache Oozie

Performance Tuning

Query Optimization

Software Construction

SQL Stored Procedures

SQL Databases

Sqoop

Unstructured Data

Data Processing

Apache Yarn

Database Optimization

Spark

PySpark

Integration Tests

Kafka

Data Pipelines

Sql Tuning

Job description

Design, develop, and maintain scalable ETL pipelines and data processing applications
Build and optimize data workflows using PySpark, Java, and Hadoop ecosystem tools
Analyze business and technical requirements to produce detailed implementation designs
Perform unit testing, integration testing, and debugging of applications
Troubleshoot and resolve performance issues related to high-volume data processing
Develop and maintain SQL queries, stored procedures, and database objects
Work with structured and unstructured datasets for healthcare analytics
Generate statistical reports and support data validation processes
Collaborate with cross-functional teams to ensure end-to-end data pipeline efficiency
Follow software engineering best practices and maintain code quality standards

Requirements

Strong experience in ETL development, data processing, and database technologies
5+ years of experience with Microsoft SQL Server and relational databases
Expertise in SQL performance tuning, indexing strategies, and query optimization
2+ years of experience with Hadoop ecosystem tools (HDFS, Hive, Impala, Spark, Kafka, Oozie, Yarn, Sqoop, Hue)
Hands-on experience with PySpark, Python, and/or Java
Experience working with large-scale data processing frameworks
Strong understanding of data transformation and data movement technologies
Ability to handle high-volume structured and unstructured datasets
Good understanding of end-to-end application/data pipeline lifecycle