Senior DevOps Engineer
New York, Inc.
Sunnyvale, United States of America
1 month ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Sunnyvale, United States of America
Tech stack
Java
Amazon Web Services (AWS)
Apache HTTP Server
Computer Programming
Computer Networks
Databases
Continuous Integration
Linux
DevOps
Python
Reliability Engineering
Ansible
Scripting (Bash/Python/Go/Ruby)
Snowflake
Mttr
Reliability of Systems
GIT
Cloudformation
Containerization
Kubernetes
Apache Flink
Vertica
Terraform
Splunk
Docker
Job description
- Ensure system reliability and availability - Monitor system issues, create strategies to detect issues, address those issues, design automated systems to troubleshoot, write and review post-mortems.
- Mitigate Operational risks - Collaborate with development teams and other stakeholders to identify potential risks, perform risk assessments, implement risk mitigation strategies, continuously monitor and review the effectiveness of risk strategies.
- Monitor system health.
- Minimize emergency response (MTTR).
- Maintain CI/CD pipelines, etc.
- Continuous improvement by collaborating with various teams.
- Automation of processes.
Requirements
Must have/Required Experience and Skills:
- 8+ years of experience on DevOps and Site Reliability Engineering.
- Hands-on with containerization and orchestration: Docker, Kubernetes/EKS.
- Proficiency in infrastructure as code tools: Terraform, Ansible, or CloudFormation.
- Experience setting up and managing services running on Kubernetes.
- In-depth understanding of SRE principals including monitoring, alerting, error budgets, fault analysis, and automation.
- In-depth knowledge of monitoring and observability tools: Apache Splunk
- Knowledge of Linux operating system principles, networking fundamentals, and systems management
- Demonstrable fluency in at least one of the following languages: Java or Python
- Ability to identify and communicate technical and architectural problems, while working with partners and their team to iteratively find solutions.
- Building and managing CI/CD pipeline - gatekeeping production deployments, develop and implement GIT branching strategies, branch protection rules, network policies, scale up/scale down the load on AWS.
- Strong problem-solving and analytical skills
- Solve performance issues and scalability issues in the system.
Technical Skills:
- DevOps and SRE
- AWS Kubernetes/EKS, Docker
- Terraform, Ansible, or CloudFormation
- Apache Splunk, Apache Flink
- Programming/Scripting using Java or Python
- CI/CD
- Database - Vertica, Snowflake.
Behavioral Skills:
- Excellent Communication skills and collaboration skills
- Ability to propose and implement improvements in the system
- Ability to work with cross-functional stakeholders
- Adaptability and a willingness to learn new technologies and techniques.
- Proactive approach to issues, ability to provide prompt resolution/work around.