Senior DevOps Engineer
Role details
Job location
Tech stack
Job description
We are seeking a Senior DevOps Engineer / Site Reliability Engineer (SRE) to join our North America team and collaborate closely with our global Cloud Engineer in China. To help us design, build, and scale a unified operations and platform engineering system. In this role, you will work closely with engineering and business teams to improve system reliability, automation, observability, and delivery efficiency, while promoting DevOps and SRE best practices across the organization.
Primarily remote, onsite collaboration from time to time.
What You'll Do
- Design, develop, and maintain a unified operations and platform management system, covering: Resource management, Monitoring and alerting, Configuration management, Automated operations and maintenance
- Build and operate observability platforms and CI/CD pipelines; develop self-healing systems and automated incident response workflows to enable intelligent operations.
- Define DevOps development standards and best practices; drive standardization of the DevOps toolchain, including technology selection and version management.
- Provide platform-level technical support to product and engineering teams, troubleshoot complex system issues, reduce technical debt, and lead major infrastructure and architecture upgrades.
- Promote SRE principles and engineering practices; organize technical sharing sessions and training, and help establish a reliability engineering framework.
- Conduct technical research and innovation; track industry trends in DevOps and cloud infrastructure, evaluate new technologies, and drive continuous improvement and modernization of the operations platform.
Requirements
Do you have experience in Tooling?, Do you have a Bachelor's degree?, * Fluent in Chinese in listening, speaking, reading and writing.
- Legal to work for any employers. We are not able to provide sponsorship yet.
- Bachelor's degree in Computer Science or related field.
- 4-6 years of hands-on experience in DevOps, SRE, or Platform Engineering roles.
- Strong experience with at least one major cloud provider (AWS, Azure, or GCP), with solid understanding of core services such as VPC, EC2, EKS/Kubernetes, RDS, and IAM.
- Deep knowledge of Linux systems, networking fundamentals, containers (Docker, Kubernetes), load balancing, and service governance.
- Proficiency with Infrastructure as Code (IaC) tools such as Terraform, Ansible, and Helm.
- Experience building and maintaining CI/CD pipelines using tools like Jenkins, Argo CD, CodeBuild, or similar.
- Hands-on experience with monitoring, logging, and tracing systems, including Prometheus, Grafana, ELK Stack, OpenTelemetry, or equivalent.
- Proficiency in at least one scripting or programming language such as Python, Shell, or Go.
- Excellent system design, analytical thinking, and complex troubleshooting skills.
- Strong cross-team communication skills; experience leading technical knowledge sharing or evangelism is a plus.
Benefits & conditions
Pulled from the full job description
- Health insurance
- 401(k) matching
- Paid time off
- Vision insurance
- Flexible schedule, Pay: $120,000.00 - $160,000.00 per year
- 401(k) matching
- Flexible schedule
- Health insurance
- Paid time off
- Vision insurance