Production Engineer (Java & AWS Cloud Infrastructure)

Kaygen Inc.
Plano, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Plano, United States of America

Tech stack

Java
Amazon Web Services (AWS)
Automation of Tests
Cloud Computing
Continuous Delivery
Software Debugging
DevOps
Failover
Fault Tolerance
Identity and Access Management
OAuth
Performance Tuning
Scrum
Reliability Engineering
Prometheus
JSON Web Token
Software Engineering
Scripting (Bash/Python/Go/Ruby)
Load Balancing
Autoscaling
Delivery Pipeline
Grafana
Spring-boot
Mttr
Infrastructure as Code (IaC)
Cloudformation
Containerization
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Cloudwatch
Amazon Web Services (AWS)
Terraform
Dynatrace
Docker
Jenkins
Microservices

Job description

  • Production Operations & Reliability: Own end-to-end production environments. Lead incident response, conduct Root Cause Analysis (RCA), and optimize systems to meet strict SLA/SLO and MTTR targets.
  • Infrastructure as Code (IaC): Treat infrastructure as software by writing clean, reusable Terraform or CloudFormation modules to automate cloud provisioning and eliminate manual drift.
  • Scalable Systems Architecture: Partner with dev teams to architect fault-tolerant, cloud-native microservices utilizing automated failover, autoscaling, and traffic routing.
  • Continuous Delivery Automation: Build, scale, and maintain robust CI/CD pipelines (Jenkins, GitLab CI, or AWS CodePipeline) to streamline automated testing and deployments.
  • Observability & Performance Tuning: Design and manage centralized monitoring and distributed tracing stacks using Prometheus, Grafana, AWS CloudWatch, and Jaeger/X-Ray to catch issues before they impact users.
  • Production Security: Implement and enforce enterprise-grade security controls, including AWS IAM roles, OAuth2, JWT, and data encryption.

Requirements

Only Citizen and Green Card preferred.

We are seeking a highly skilled Production Engineer to bridge the gap between application development and system operations. In this role, you will use your software engineering background to ensure our core platforms are highly available, scalable, and resilient.

You won't just monitor servers-you will dive directly into application code written in Java and Spring Boot to debug bottlenecks, automate infrastructure deployment on AWS, and optimize production performance. If you approach operational challenges as software problems, we want you on our team., * Experience: 3-5 years of dedicated experience in Production Engineering, Site Reliability Engineering (SRE), or DevOps.

  • Backend Engineering: Strong proficiency in Java and Spring Boot with the ability to read, trace, and debug complex microservice applications.
  • AWS & Containerization: Hands-on experience with core cloud infrastructure, specifically Docker, Kubernetes (EKS/ECS), Lambda, SQS, SNS, and Application Load Balancers (ALB).
  • Automation: Practical experience using Terraform for cloud infrastructure automation and scripting.
  • Telemetry Stack: Deep practical knowledge of Prometheus and Grafana or AWS CloudWatch for real-time visibility.
  • Environment: Comfortable working in fast-paced Agile/Scrum environments and participating in production on-call rotations.

What Will Make You Stand Out

  • Proven track record of migrating legacy monoliths into cloud-native microservices.
  • Experience running cost-optimization and cloud-resource rightsizing initiatives.
  • A metric-driven mindset focused on improving system uptime and reducing operational overhead.

Apply for this position