PySpark Scala Developer

Rose International
Tampa, United States of America
3 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 112K

Job location

Remote
Tampa, United States of America

Tech stack

Airflow
Amazon Web Services (AWS)
Automation of Tests
Big Data
Software Quality
Information Engineering
Data Integrity
Data Systems
Distributed Computing Environment
Distributed Systems
Fault Tolerance
Hadoop
Hadoop Distributed File System
MapReduce
Python
Apache Oozie
Performance Tuning
Scala
Software Engineering
Workflow Management Systems
Apache Zookeeper
Data Processing
Data Storage Technologies
Snowflake
Spark
PySpark
Apache Flink
Kafka
Data Management
Video Streaming
Data Pipelines
Programming Languages

Job description

  • In this role, you will be responsible for developing scalable data solutions using PySpark and Scala, while maintaining a strong foundational knowledge of legacy and modern distributed architectures.
  • You will work closely with cross-functional teams to design, build, and optimize data pipelines that handle massive volumes of critical financial data.

Job Duties

  • Pipeline Development:

o Design, develop, and deploy highly scalable Big Data pipelines using PySpark and Scala to process large-scale datasets.

  • Hadoop Ecosystem Management:

o Utilize in-depth understanding of HDFS architecture, data storage, and fault-tolerance mechanisms to optimize data reliability and accessibility.

  • System Administration & Coordination:

o Execute HDFS commands for administration and leverage ZooKeeper for distributed coordination services and cluster management.

  • MapReduce Integration:

o Apply a fundamental understanding of the MapReduce programming paradigm to optimize workloads and integrate seamlessly with primary development in Spark/Flink.

  • Code Quality & Optimization:

o Write clean, efficient, and well-documented code. o Conduct performance tuning and troubleshoot bottlenecks in distributed data processing jobs.

  • Collaboration:

o Work alongside data architects, business analysts, and downstream consumers to ensure data solutions meet strict business requirements and regulatory compliance standards.

#CT1

  • Only those lawfully authorized to work in the designated country associated with the position will be considered.

  • Please note that all Position start dates and duration are estimates and may be reduced or lengthened based upon a client's business needs and requirements.

Requirements

Do you have experience in ZooKeeper?, Do you have a Bachelor's degree?, Must Have Skills/Attributes: Banking/Financial, Big Data, Flink, Hadoop, PySpark, Scala, Spark, Zookeeper Experience Desired: HDFS architecture, data storage, and fault tolerance mechanisms experience (5-8+ yrs); Experience with HDFS commands and administration (5-8+ yrs) Preferred Education: Bachelor's Degree C2C is not available, * Bachelor's Degree, * Experience Level: 5 to 8+ years of hands-on data engineering and software development experience, preferably within a large enterprise or financial services environment.

  • Programming Languages: Strong proficiency in Scala and Python (PySpark) for data processing.
  • Hadoop/HDFS Expertise:

o In-depth understanding of HDFS architecture, data storage, and fault tolerance mechanisms. o Experience with HDFS commands and administration. o In-depth knowledge of HDFS architecture, fault tolerance, and hands-on experience with HDFS administration commands.

  • Distributed Systems:

o Fundamental understanding of MapReduce programming paradigm, even if primary development is in Spark/Flink. o Foundational understanding of the MapReduce programming paradigm and experience with cluster coordination tools, specifically ZooKeeper. o Knowledge of Zookeeper for distributed coordination services.

  • Spark/Flink:

o Proven track record of deploying production-level Apache Spark (or Flink) applications.

  • Onsite Requirement:

o Must be willing and able to work onsite in the Tampa, FL office 3 days a week., * Experience with workflow orchestration tools (e.g., Apache Airflow, Oozie).

  • Familiarity with streaming technologies such as Apache Kafka.
  • Knowledge of modern cloud data platforms (AWS, GCP, or Snowflake) as enterprise environments modernize.
  • Understanding of CI/CD pipelines and automated testing in a Big Data environment., * The ideal candidate will have deep hands-on experience in the Hadoop ecosystem, specializing in distributed systems, data storage, and high-performance processing pipelines.

Benefits & conditions

3.83.8 out of 5 stars Tampa, FL 33610 Hybrid work $50 - $54 an hour - Temp-to-hire

Apply for this position