SOW - Senior Cloud Support Engineer- Plano, TX.
Role details
Job location
Tech stack
Job description
Overview: We are seeking a highly skilled Cloud Engineer / Platform Support Specialist to join our team. This role involves providing advanced-level support for a cloud platform within a large enterprise environment, hosting thousands of applications on AWS. The successful candidate will act as the first point of contact for application developers encountering technical issues, leveraging a ticketing system to manage incidents. This position requires a strong foundation in coding, software, infrastructure, and cloud technologies, and operates within a follow-the-sun support model. It requires strong communications including the ability the clearly articulate problem statements and solutions., * Deliver incident management and advanced-level support for the AWS Platform, hosting a large volume of applications
- Serve as the initial point of contact for application developers via a ticketing system.
- Communicate effectively with users at various organizational levels.
- Implement and utilize automation to support the scalability of the environment.
- Optimize operational processes to enhance efficiency, reliability, and security.
- Train users to self-diagnose and troubleshoot issues for expedited resolution.
- Conduct thorough investigations into issues to identify root causes and document strategies to prevent recurrence.
- Provide support for public cloud environments, particularly AWS.
- Manage events and incidents efficiently.
- Develop and implement scalable automation processes to handle tasks in a large-scale environment.
- Analyze and debug incidents, follow up to gather feedback and prevent future issues.
- Support different development environments, including Unix, Linux, Mainframe, and Windows.
Requirements
Amazon Elastic Kubernetes Service (EKS): Experience deploying, managing, and troubleshooting Kubernetes clusters on AWS EKS. * Kubernetes Administration: Strong understanding of Kubernetes architecture, including pods, deployments, services, and networking. * Helm & Kubernetes Operators: Familiarity with Helm charts for package management and Kubernetes Operators for automation. * Cluster Security & RBAC: Knowledge of Kubernetes Role-Based Access Control (RBAC), security policies, and best practices. * Scaling & Performance Optimization: Experience with autoscaling, load balancing, and optimizing Kubernetes workloads. * Monitoring & Logging: Hands-on experience with tools like Prometheus, Grafana, Fluentd, or AWS CloudWatch for monitoring Kubernetes clusters. * Containerization & Orchestration: Strong experience in Docker and other AWS containerized services (ECS and AWS Fargate) * Terraform: Strong experience in writing, managing, troubleshooting and optimizing Terraform configurations for AWS infrastructure. * Infrastructure as Code (IaC) Expertise: Deep understanding of IaC principles, including automation, version control, and modularization. * AWS Cloud Services: Hands-on experience with AWS services such as EC2, S3, Lambda, VPC, IAM, and CloudFormation. * Security Best Practices: Knowledge of AWS security policies, identity and access management (IAM), and compliance standards. * CI/CD Integration: Experience integrating Terraform with CI/CD pipelines for automated deployments. Commitment to automating processes for continuous improvement. * Proficiency in SDLC: with the ability to read code (Java and Python). * Troubleshooting & Optimization: Ability to diagnose and resolve infrastructure issues, optimize performance, and ensure scalability. Strong troubleshooting and diagnostic skills for security and access issues in a large enterprise environment. * Excellent communication skills: Ability to analyze details, understand incident causation, and implement preventive measures to ensure reliability and security.
Nice to have:
- Database management skills (Oracle DBA, Cassandra DBA, CockroachDB) include performance tuning, connectivity, backups, indexes, and monitoring alarms.
- Middleware and messaging experience (Kafka, MQ).
- Experience with Tomcat.
- System engineering and administration skills (Unix/Linux).
- Java or Python Development