PySpark Scala Developer
Role details
Job location
Tech stack
Job description
- In this role, you will be responsible for developing scalable data solutions using PySpark and Scala, while maintaining a strong foundational knowledge of legacy and modern distributed architectures.
- You will work closely with cross-functional teams to design, build, and optimize data pipelines that handle massive volumes of critical financial data.
Job Duties
- Pipeline Development:
o Design, develop, and deploy highly scalable Big Data pipelines using PySpark and Scala to process large-scale datasets.
- Hadoop Ecosystem Management:
o Utilize in-depth understanding of HDFS architecture, data storage, and fault-tolerance mechanisms to optimize data reliability and accessibility.
- System Administration & Coordination:
o Execute HDFS commands for administration and leverage ZooKeeper for distributed coordination services and cluster management.
- MapReduce Integration:
o Apply a fundamental understanding of the MapReduce programming paradigm to optimize workloads and integrate seamlessly with primary development in Spark/Flink.
- Code Quality & Optimization:
o Write clean, efficient, and well-documented code. o Conduct performance tuning and troubleshoot bottlenecks in distributed data processing jobs.
- Collaboration:
o Work alongside data architects, business analysts, and downstream consumers to ensure data solutions meet strict business requirements and regulatory compliance standards.
#CT1
-
Only those lawfully authorized to work in the designated country associated with the position will be considered.
-
Please note that all Position start dates and duration are estimates and may be reduced or lengthened based upon a client's business needs and requirements.
Requirements
Do you have experience in ZooKeeper?, Do you have a Bachelor's degree?, Must Have Skills/Attributes: Banking/Financial, Big Data, Flink, Hadoop, PySpark, Scala, Spark, Zookeeper Experience Desired: HDFS architecture, data storage, and fault tolerance mechanisms experience (5-8+ yrs); Experience with HDFS commands and administration (5-8+ yrs) Preferred Education: Bachelor's Degree C2C is not available, * Bachelor's Degree, * Experience Level: 5 to 8+ years of hands-on data engineering and software development experience, preferably within a large enterprise or financial services environment.
- Programming Languages: Strong proficiency in Scala and Python (PySpark) for data processing.
- Hadoop/HDFS Expertise:
o In-depth understanding of HDFS architecture, data storage, and fault tolerance mechanisms. o Experience with HDFS commands and administration. o In-depth knowledge of HDFS architecture, fault tolerance, and hands-on experience with HDFS administration commands.
- Distributed Systems:
o Fundamental understanding of MapReduce programming paradigm, even if primary development is in Spark/Flink. o Foundational understanding of the MapReduce programming paradigm and experience with cluster coordination tools, specifically ZooKeeper. o Knowledge of Zookeeper for distributed coordination services.
- Spark/Flink:
o Proven track record of deploying production-level Apache Spark (or Flink) applications.
- Onsite Requirement:
o Must be willing and able to work onsite in the Tampa, FL office 3 days a week., * Experience with workflow orchestration tools (e.g., Apache Airflow, Oozie).
- Familiarity with streaming technologies such as Apache Kafka.
- Knowledge of modern cloud data platforms (AWS, GCP, or Snowflake) as enterprise environments modernize.
- Understanding of CI/CD pipelines and automated testing in a Big Data environment., * The ideal candidate will have deep hands-on experience in the Hadoop ecosystem, specializing in distributed systems, data storage, and high-performance processing pipelines.
Benefits & conditions
3.83.8 out of 5 stars Tampa, FL 33610 Hybrid work $50 - $54 an hour - Temp-to-hire