Java Data Engineer (AWS)
Role details
Job location
Tech stack
Job description
We are looking for a Senior Java Data Engineer with strong experience building and supporting large-scale batch data processing systems on AWS. This role focuses on ETL-style data pipelines, data warehousing concepts, and high-volume analytics platforms. The ideal candidate is a hands-on engineer who can independently debug complex issues and work closely with analytics, risk, and platform teams., * Design, develop, and maintain Java-based batch data processing applications
-
Build and support ETL pipelines operating at scale in AWS
-
Work extensively with AWS EMR for distributed data processing
-
Develop and maintain integrations using:
-
Amazon S3
-
AWS Glue (Glue Tables)
-
AWS Athena
-
AWS Lambda (where applicable)
Debug and resolve complex production issues independently
Support data warehouse-like environments and analytical workloads
Collaborate with analytics, fraud risk, and platform teams to ensure data accuracy and reliability
Improve operational efficiency through automation
Leverage AI productivity tools (e.g., GitHub Copilot) responsibly to improve development quality and reduce risk
Follow engineering best practices for reliability, scalability, and compliance
Requirements
-
Strong hands-on Java development experience
-
Solid experience working with AWS, specifically:
-
Amazon S3
-
EMR (primary and heavily used)
-
AWS Glue (Glue Tables)
-
AWS Athena
-
AWS Lambda
Proven experience with batch processing systems (non-real-time)
Strong debugging and troubleshooting skills with the ability to work independently
Experience building ETL-style data pipelines
Background working in data warehouse or analytics platforms
Preferred Qualifications (Nice to Haves)
- Experience with Apache Spark or Java Spark
- Exposure to Snowflake (basic understanding required; deep expertise not expected)
- Experience with automation frameworks or tooling
- Exposure to AI-assisted development tools (e.g., GitHub Copilot or similar)
- Experience with Terraform (currently used, though being phased out)
- Background supporting fraud risk, financial data, or analytics platforms
- Awareness or prior exposure to PySpark (acceptable, though role is Java-focused)