Production Engineer (Java & AWS Cloud Infrastructure)
Role details
Job location
Tech stack
Job description
- Production Operations & Reliability: Own end-to-end production environments. Lead incident response, conduct Root Cause Analysis (RCA), and optimize systems to meet strict SLA/SLO and MTTR targets.
- Infrastructure as Code (IaC): Treat infrastructure as software by writing clean, reusable Terraform or CloudFormation modules to automate cloud provisioning and eliminate manual drift.
- Scalable Systems Architecture: Partner with dev teams to architect fault-tolerant, cloud-native microservices utilizing automated failover, autoscaling, and traffic routing.
- Continuous Delivery Automation: Build, scale, and maintain robust CI/CD pipelines (Jenkins, GitLab CI, or AWS CodePipeline) to streamline automated testing and deployments.
- Observability & Performance Tuning: Design and manage centralized monitoring and distributed tracing stacks using Prometheus, Grafana, AWS CloudWatch, and Jaeger/X-Ray to catch issues before they impact users.
- Production Security: Implement and enforce enterprise-grade security controls, including AWS IAM roles, OAuth2, JWT, and data encryption.
Requirements
Only Citizen and Green Card preferred.
We are seeking a highly skilled Production Engineer to bridge the gap between application development and system operations. In this role, you will use your software engineering background to ensure our core platforms are highly available, scalable, and resilient.
You won't just monitor servers-you will dive directly into application code written in Java and Spring Boot to debug bottlenecks, automate infrastructure deployment on AWS, and optimize production performance. If you approach operational challenges as software problems, we want you on our team., * Experience: 3-5 years of dedicated experience in Production Engineering, Site Reliability Engineering (SRE), or DevOps.
- Backend Engineering: Strong proficiency in Java and Spring Boot with the ability to read, trace, and debug complex microservice applications.
- AWS & Containerization: Hands-on experience with core cloud infrastructure, specifically Docker, Kubernetes (EKS/ECS), Lambda, SQS, SNS, and Application Load Balancers (ALB).
- Automation: Practical experience using Terraform for cloud infrastructure automation and scripting.
- Telemetry Stack: Deep practical knowledge of Prometheus and Grafana or AWS CloudWatch for real-time visibility.
- Environment: Comfortable working in fast-paced Agile/Scrum environments and participating in production on-call rotations.
What Will Make You Stand Out
- Proven track record of migrating legacy monoliths into cloud-native microservices.
- Experience running cost-optimization and cloud-resource rightsizing initiatives.
- A metric-driven mindset focused on improving system uptime and reducing operational overhead.