AWS Data Engineer (Real-Time/ Streaming)
The Modern
Atlanta, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Atlanta, United States of America
Tech stack
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Cloud Computing
Continuous Integration
Data Integrity
ETL
Fraud Prevention and Detection
Monitoring of Systems
Python
Enterprise Messaging Systems
Data Streaming
Systems Architecture
Systems Integration
Amazon Connect
Parquet
Data Storage Technologies
Delivery Pipeline
Servicebus
Event Driven Architecture
Data Lake
PySpark
Amazon Web Services (AWS)
Kafka
Terraform
Stream Processing
Data Pipelines
Job description
- Develop a comprehensive plan for migrating near real-time fraud detection campaigns from on-premises systems to AWS.
- Design and implement event-driven architectures to process inbound dialer data (fraud events) using services such as Amazon EventBridge, Kafka, Kinesis Data Streams, and Kinesis Firehose.
- Build and manage scalable data pipelines using AWS Glue (ETL jobs, Crawlers), PySpark, and Python for data ingestion, transformation, and processing.
- Configure and manage Glue Crawlers to automatically discover schemas and update the Data Catalog.
- Store and optimize data using Parquet format and enable analytics through Amazon Athena for efficient querying.
- Develop integrations between Customer Profiles and messaging platforms to automatically trigger profile updates and downstream processes.
- Implement automation to trigger fraud-related outbound calls based on updates in customer profiles.
- Design and orchestrate workflows using AWS Step Functions to manage complex processing pipelines.
- Provision and manage cloud infrastructure using Terraform (Infrastructure as Code).
- Optimize system architecture for scalability, reliability, cost-efficiency, and ensure data integrity and security.
- Conduct end-to-end testing of the entire framework to validate functionality, performance, and reliability.
- Deploy, automate, and manage resources using CI/CD pipelines.
- Continuously monitor system performance and implement optimizations post-deployment.
- Maintain detailed documentation of architecture, workflows, and operational processes.
Technical Skills
-
Strong expertise in AWS services including:
-
Lambda
-
S3
-
EventBridge
-
Kinesis (Data Streams & Firehose)
-
Glue (ETL + Crawlers)
-
Step Functions
-
Amazon Connect
-
Athena
-
Macie
Proficient in:
- Python
- PySpark
Requirements
- Glue Crawlers for schema discovery and cataloging
- Parquet-based data storage
- Building scalable data pipelines
Strong understanding of event-driven architectures (Pub/Sub model)
Hands-on experience with Terraform (Infrastructure as Code)
Familiarity with CI/CD tools and automation pipelines
Preferred Qualifications
-
AWS Certifications:
-
AWS Certified Developer
-
AWS Solutions Architect
Experience with:
- Messaging platforms like Kafka and Amazon EventBridge
- Designing real-time data processing systems
- Using Glue Crawlers + Athena for data lake architectures