TELECOMMUTE PySpark / Java Developer (Data Engineer

Innovative IT Solutions Inc
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote

Tech stack

Java
Unit Testing
Big Data
Health Informatics
Cloudera Impala
Software Quality
Databases
Data Validation
ETL
Data Transformation
Relational Databases
Database Schema
Software Debugging
Hadoop
Hadoop Distributed File System
Hive
Python
Microsoft SQL Server
Apache Oozie
Performance Tuning
Query Optimization
Software Construction
SQL Stored Procedures
SQL Databases
Sqoop
Unstructured Data
Data Processing
Apache Yarn
Database Optimization
Spark
PySpark
Integration Tests
Kafka
Data Pipelines
Sql Tuning

Job description

  • Design, develop, and maintain scalable ETL pipelines and data processing applications
  • Build and optimize data workflows using PySpark, Java, and Hadoop ecosystem tools
  • Analyze business and technical requirements to produce detailed implementation designs
  • Perform unit testing, integration testing, and debugging of applications
  • Troubleshoot and resolve performance issues related to high-volume data processing
  • Develop and maintain SQL queries, stored procedures, and database objects
  • Work with structured and unstructured datasets for healthcare analytics
  • Generate statistical reports and support data validation processes
  • Collaborate with cross-functional teams to ensure end-to-end data pipeline efficiency
  • Follow software engineering best practices and maintain code quality standards

Requirements

  • Strong experience in ETL development, data processing, and database technologies
  • 5+ years of experience with Microsoft SQL Server and relational databases
  • Expertise in SQL performance tuning, indexing strategies, and query optimization
  • 2+ years of experience with Hadoop ecosystem tools (HDFS, Hive, Impala, Spark, Kafka, Oozie, Yarn, Sqoop, Hue)
  • Hands-on experience with PySpark, Python, and/or Java
  • Experience working with large-scale data processing frameworks
  • Strong understanding of data transformation and data movement technologies
  • Ability to handle high-volume structured and unstructured datasets
  • Good understanding of end-to-end application/data pipeline lifecycle

Apply for this position