Site Reliability Engineer

Tekvivid Inc
San Jose, United States of America
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

San Jose, United States of America

Tech stack

Bash
Cloud Computing
Cloud Engineering
Cloud Storage
Continuous Integration
Query Languages
DevOps
Monitoring of Systems
Python
Performance Tuning
Reliability Engineering
Prometheus
SQL Databases
Systems Architecture
Data Logging
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Cloud Monitoring
System Availability
Grafana
Reliability of Systems
Build Server
Containerization
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Low Latency
Terraform
Docker
Jenkins

Job description

We are looking for an experienced Site Reliability Engineer (SRE) with strong expertise in Google Cloud Platform (Google Cloud Platform) and hands-on experience in TQL (Telemetry/Query Language or similar monitoring/query tools). The ideal candidate will be responsible for ensuring system reliability, scalability, and performance while maintaining high availability of critical applications., * Design, implement, and maintain highly reliable and scalable systems on Google Cloud Platform

  • Monitor system performance, availability, and latency using TQL or similar query/monitoring tools
  • Automate infrastructure provisioning using Infrastructure as Code (IaC) tools (Terraform, Deployment Manager)
  • Troubleshoot production issues and perform root cause analysis
  • Implement CI/CD pipelines for faster and reliable deployments
  • Collaborate with development teams to improve system reliability and performance
  • Manage incident response, on-call support, and post-incident reviews
  • Optimize system performance, cost, and resource utilization
  • Ensure security, compliance, and best practices across cloud environments

Requirements

  • 8+ years of experience in Site Reliability Engineering / DevOps / Cloud Engineering
  • Strong hands-on experience with Google Cloud Platform (Google Cloud Platform) services (Compute Engine, GKE, Cloud Storage, etc.)
  • Google Cloud Platform Certification (Professional Cloud DevOps Engineer / Cloud Architect preferred)
  • Experience with TQL or similar query languages for monitoring/logging (e.g., PromQL, SQL-like tools)
  • Proficiency in scripting languages such as Python, Bash, or Go
  • Experience with containerization and orchestration tools (Docker, Kubernetes)
  • Strong understanding of CI/CD tools (Jenkins, GitLab CI, Cloud Build, etc.)
  • Knowledge of monitoring tools (Prometheus, Grafana, Stackdriver/Cloud Monitoring)
  • Experience with Infrastructure as Code (Terraform preferred)
  • Solid understanding of networking, security, and system architecture

Apply for this position