Lead Big Data Software Engineer

Sync NI

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Tech stack

Java

Airflow

Amazon Web Services (AWS)

Databases

Database Queries

Open Source Technology

Parallel Computing

Performance Tuning

Query Optimization

Software Engineering

SQL Databases

Strategies of Testing

Parquet

Spark

Data Lake

Kubernetes

Performance Monitor

Kafka

Presto

Terraform

Data Pipelines

Job description

The Rapid7 Data Platform is a unified, integrated platform powered by Rapid7's product suite providing our customers enhanced visibility into their attack surface, operational efficiency, risk management, and decision-making capabilities. Our teams are responsible for consolidating data from all Rapid7 products, transforming it for optimised retrieval, and ensuring high-performance and seamless access to our customers. This role is crucial to the platform's success as it focuses on building a highly scalable and reliable data mesh that powers cross-product use cases through a distributed query engine for big data analytics. About the Role We are seeking an innovative, self-motivated Data and Performance Engineer who will act as a technical leader to collaborate with our product teams to optimise their data pipelines and retrieval processes for performance and efficiency. You will work with the Data Platform teams to implement monitoring and testing strategies to ensure the performance of the data and their queries as well as identify optimisations. Technologies you will work with:

Trino
Iceberg
Parquet
Spark
Airflow
Kafka
AWS services such as Glue, S3, EKS In this role, you will:
Analyse and optimise distributed SQL queries to improve performance
Suggest optimisations to our data pipelines
Provide recommendations for efficient partitioning strategies and schema designs
Conduct performance tuning for the data pipelines and queries
Develop performance monitoring strategies and tools

Requirements

5+ years of hands-on software engineering experience, with a specific focus on database query optimization
Strong database system expertise in query execution planning, query optimization, performance tuning, parallel computing, and schema design
Experience in continuously monitoring and optimising data pipelines for performance and cost-effectiveness
Ability to design, develop, implement, and operate highly reliable large-scale data lake systems in cooperation with product teams
Skills to analyse and performance test the data mesh performance and scalability, identify bottlenecks, recommend and develop improvements
Mentorship and guidance of junior engineers, providing technical leadership and fostering a culture of continuous improvement and innovation
Excellent verbal and written communication skills.
Strong, creative problem solving ability. Nice to haves:
Trino/Presto data-mesh
AWS, Terraform, Kubernetes
Java

About the company

Rapid7 is creating a more secure digital future for all by helping organizations strengthen their security programs in the face of accelerating digital transformation. Our portfolio of best-in-class solutions empowers security professionals to manage risk and eliminate threats across the entire threat landscape from apps to the cloud to traditional infrastructure to the dark web. We foster open source communities and cutting-edge research-using these insights to optimize our products and arm the global security community with the latest in attacker methods. Trusted by more than 11,000 customers worldwide, our industry-leading solutions and services help businesses stay ahead of attackers, ahead of the competition, and future-ready for what's next.