SRE Infrastructure Engineer

NextGen Staffing
South San Francisco, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

South San Francisco, United States of America

Tech stack

Agile Methodologies
Artificial Intelligence
Systems Engineering
Computer Programming
Continuous Integration
Linux
DevOps
Github
Python
Octopus Deploy
Reliability Engineering
Prometheus
Data Logging
Google Cloud Platform
Cloud Monitoring
Grafana
Containerization
Gitlab-ci
Kubernetes
Low Latency
Deployment Automation
Terraform
Devsecops
Jenkins
Artifactory

Job description

  • Google Cloud Platform Infrastructure Management: Design, deploy, and maintain robust infrastructure components, including VPCs, Compute Engine, GKE (Kubernetes), and storage solutions.
  • Automation & IaC: Utilize Terraform or Deployment Manager to manage cloud resources and build CI/CD pipelines to automate deployments. Minimizing manual, repetitive tasks by developing automation scripts and custom tools to streamline deployments and operations.
  • Observability & Incident Management: Develop monitoring, alerting, and logging systems (e.g., Cloud Monitoring, Prometheus, Grafana). Act as primary on-call to troubleshoot production incidents.
  • Incident Management: Serving as a first responder for system outages and conducting deep-dive root cause analysis (post-mortems) to prevent recurrence
  • CI/CD Pipeline Management: Designing and supporting automated deployment pipelines using Jenkins, ArgoCD, Artifactory, DevSecOps, GitLab CI, or GitHub Actions
  • Reliability Engineering: Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) - Latency, Traffic, Errors, and Saturation
  • Optimization & Security: Proactively optimize infrastructure for cost, performance, and security compliance.
  • Site Reliability Engineer, Google Cloud Engine AI SRE at Google: Focus specifically on AI workload health, and GCE visibility

Requirements

We are seeking a SRE Infrastructure Resource having 8+ years of professional experience ensuring the reliability, scalability, and performance of Google Cloud-based services through automation, monitoring, and proactive engineering. Key responsibilities include managing infrastructure as code (Terraform), optimizing GKE/Kubernetes, incident response, and implementing SLIs/SLOs to minimize manual toil.

This role requires close collaboration with cross-functional teams, adherence to DevOps and Agile practices, and ownership of service quality and delivery., * Experience: 8+ years in SRE, DevOps, or systems engineering, specifically with Google Cloud Platform.

  • Technical Skills: Deep knowledge of Linux, Kubernetes (GKE), networking (VPCs, CDNs), and containerization.
  • Programming: Proficiency in scripting/programming languages like Python, Go, or Shell.
  • Methodologies: Strong understanding of GitOps, CI/CD pipelines, and SRE principles (error budgets, toil reduction)
  • Strong troubleshooting skills across the full stack (network, OS, application).
  • Ability to balance system stability with the need for rapid deployment.
  • Observability Tools: Experience implementing monitoring and logging stacks like Prometheus, Grafana, or Google Cloud Operations Suite
  • Excellent collaboration skills to work with development teams for service ownership

Soft Skills

  • Strong problem-solving and analytical skills
  • Clear communication with technical and non-technical stakeholders
  • Ownership mindset and production-grade engineering discipline
  • Ability to work independently and within cross-functional teams

About the company

Next Gen Software Solutions is a trusted provider of IT Staffing and consulting services dedicated to empowering businesses with cutting-edge technology solutions and exceptional talent. We specialize in delivering tailored IT consulting services, innovative software solutions, and connecting businesses with highly skilled IT professionals. Founded and led by a dedicated U.S. Army solider, Next Gen Software Solutions is deeply rooted in the core values of integrity, discipline, commitment, and experience-principles that guide every aspect of our operations., Next Gen Software Solutions is a trusted provider of IT staffing and consulting services dedicated to empowering businesses with cutting-edge technology solutions and exceptional talent. We specialize in delivering tailored IT consulting services, innovative software solutions, and connecting businesses with highly skilled IT professionals. Founded and led by a dedicated U.S. Army soldier, Next Gen Software Solutions is deeply rooted in the core values of integrity, discipline, commitment, and excellence-principles that guide every aspect of our operations.

Apply for this position