PySpark Scala Developer

Rose International

Tampa, United States of America

3 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 112K

Job location

Remote

Tampa, United States of America

Tech stack

Airflow

Amazon Web Services (AWS)

Automation of Tests

Big Data

Software Quality

Information Engineering

Data Integrity

Data Systems

Distributed Computing Environment

Distributed Systems

Fault Tolerance

Hadoop

Hadoop Distributed File System

MapReduce

Python

Apache Oozie

Performance Tuning

Scala

Software Engineering

Workflow Management Systems

Apache Zookeeper

Data Processing

Data Storage Technologies

Snowflake

Spark

PySpark

Apache Flink

Kafka

Data Management

Video Streaming

Data Pipelines

Programming Languages

Job description

In this role, you will be responsible for developing scalable data solutions using PySpark and Scala, while maintaining a strong foundational knowledge of legacy and modern distributed architectures.
You will work closely with cross-functional teams to design, build, and optimize data pipelines that handle massive volumes of critical financial data.

Job Duties

Pipeline Development:

o Design, develop, and deploy highly scalable Big Data pipelines using PySpark and Scala to process large-scale datasets.

Hadoop Ecosystem Management:

o Utilize in-depth understanding of HDFS architecture, data storage, and fault-tolerance mechanisms to optimize data reliability and accessibility.

System Administration & Coordination:

o Execute HDFS commands for administration and leverage ZooKeeper for distributed coordination services and cluster management.

MapReduce Integration:

o Apply a fundamental understanding of the MapReduce programming paradigm to optimize workloads and integrate seamlessly with primary development in Spark/Flink.

Code Quality & Optimization:

o Write clean, efficient, and well-documented code. o Conduct performance tuning and troubleshoot bottlenecks in distributed data processing jobs.

Collaboration:

o Work alongside data architects, business analysts, and downstream consumers to ensure data solutions meet strict business requirements and regulatory compliance standards.

#CT1

Only those lawfully authorized to work in the designated country associated with the position will be considered.
Please note that all Position start dates and duration are estimates and may be reduced or lengthened based upon a client's business needs and requirements.

Requirements

Do you have experience in ZooKeeper?, Do you have a Bachelor's degree?, Must Have Skills/Attributes: Banking/Financial, Big Data, Flink, Hadoop, PySpark, Scala, Spark, Zookeeper Experience Desired: HDFS architecture, data storage, and fault tolerance mechanisms experience (5-8+ yrs); Experience with HDFS commands and administration (5-8+ yrs) Preferred Education: Bachelor's Degree C2C is not available, * Bachelor's Degree, * Experience Level: 5 to 8+ years of hands-on data engineering and software development experience, preferably within a large enterprise or financial services environment.

Programming Languages: Strong proficiency in Scala and Python (PySpark) for data processing.
Hadoop/HDFS Expertise:

o In-depth understanding of HDFS architecture, data storage, and fault tolerance mechanisms. o Experience with HDFS commands and administration. o In-depth knowledge of HDFS architecture, fault tolerance, and hands-on experience with HDFS administration commands.

Distributed Systems:

o Fundamental understanding of MapReduce programming paradigm, even if primary development is in Spark/Flink. o Foundational understanding of the MapReduce programming paradigm and experience with cluster coordination tools, specifically ZooKeeper. o Knowledge of Zookeeper for distributed coordination services.

Spark/Flink:

o Proven track record of deploying production-level Apache Spark (or Flink) applications.

Onsite Requirement:

o Must be willing and able to work onsite in the Tampa, FL office 3 days a week., * Experience with workflow orchestration tools (e.g., Apache Airflow, Oozie).

Familiarity with streaming technologies such as Apache Kafka.
Knowledge of modern cloud data platforms (AWS, GCP, or Snowflake) as enterprise environments modernize.
Understanding of CI/CD pipelines and automated testing in a Big Data environment., * The ideal candidate will have deep hands-on experience in the Hadoop ecosystem, specializing in distributed systems, data storage, and high-performance processing pipelines.

Benefits & conditions

3.83.8 out of 5 stars Tampa, FL 33610 Hybrid work $50 - $54 an hour - Temp-to-hire

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all