PySpark Developer

BUSIGENCE TECHNOLOGIES, LLC

yesterday

Role details

Contract type

Internship / Graduate position

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Experience level

Junior

Job location

Remote

Tech stack

API

Big Data

Databases

Information Engineering

Data Structures

Distributed Systems

JSON

Python

Performance Tuning

XML

Freeform SQL

Spark

PySpark

Functional Programming

Job description

5.Developed data engineering pipelines on real-world problem (not just toy projects)?

6.Implemented advanced SQL queries

7.Developed complex logics in PySpark3 8.Confidence to learn PySpark3 -MLlib within two weeks? https://spark.apache.org/docs/latest/api/python/reference/pyspark.ml.html (we shall guide but won't spoon-feed), We are looking for engineers with real passion for distributed computing with actual hands-on experience developing data application on PySpark. You would be required to work with our data science team on development of several data applications.

Requirements

Do you have experience in Spark implementation?, This is an immediate requirement. We shall have an accelerated interview process for fast closure - you would required to be proactive and responsive, We are looking for developer with real passion for data science pipelines. This is a specialist and individual contributor role. Product development experience preferably at a startup or a lean team is desired, 1. Must be able to fetching data from data sources (databases, APIs, flat files, etc.)

Must know in-and-out of functional programming in Python with strong flair for data structures, linear algebra, & algorithms implementation
Must be able to convert, break, & distribute existing Python codes to functional programming syntax
Must have worked on atleast one real world project in production on PySpark
Must have implemented complex mathematical logics through PySpark at scale on parallel/distributed clusters
Must be able to recognize code that is more parallel, and less memory constrained, and you must show how to apply best practices to avoid runtime issues and performance bottlenecks
Must have worked on high degree of performance tuning, optimization, configuration, & scheduling in PySpark
Must have integrated APIs, streams, databases, files (JSON, XML, CSV etc) through PySpark

Preferred

Good to have working knowledge vinaigrette of first-class, high order, & pure functions, recurisons, lazy evaluations, and immutable data structures
A firm understanding of the underlying mathematics will be needed to adapt modelling techniques to fit the problem space with large data (1M+ records)
Good to have worked on PySpark MLlib and PySpark ML
Configured Checkpointing and Directed Acyclic Graphs (DAG) on PySpark cluster
Worked on development of data platform

Benefits & conditions

Pulled from the full job description

Flexible schedule, Competitive compensation, You shall be working on our revolutionary products which are pioneer in their respective categories. This is a fact.

We try real hard to hire fun loving crazy folks who are driven by more than a paycheque. You shall be working with creamiest talent on extremely challenging problems at most happening workplace

About the company

Busigence is a Decision Intelligence Company. We create decision intelligence products for real people by combining data, technology, business, and behaviour enabling strengthened decisions.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all