Digital Site Reliability Engineer (SRE) - Local...
Role details
Job location
Tech stack
Job description
This position requires someone to be in an office work setting in Columbus, OH.
The Digital Site Reliability Engineer (SRE) - GCP Cloud Adoption Engineer is responsible for facilitating the migration, adoption, and optimization of Google Cloud Platform (GCP) services within the organization.
This role combines deep expertise in cloud technologies with a strong focus on reliability, scalability, and automation, ensuring that digital services are robust, efficient, and aligned with business objectives. The engineer will work cross-functionally with development, operations, and security teams to implement best practices and drive innovation in cloud infrastructure
Your future duties and responsibilities:
Cloud Adoption Strategy: Collaborate with stakeholders to develop and execute strategies for adopting GCP services, including migration planning, architecture design, and implementation.
Reliability Engineering: Apply SRE principles to GCP environments, focusing on service reliability, availability, and scalability. Develop monitoring, alerting, and automation solutions to prevent outages and reduce manual intervention.
Cloud Infrastructure Management: Build, maintain, and optimize cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform or Deployment Manager.
Automation & CI/CD: Design and implement automated deployment pipelines and operational workflows to enable continuous integration and delivery of cloud-based applications.
Incident Management: Lead incident response for cloud-related issues, conduct root cause analysis, and implement corrective actions to improve system reliability.
Performance Optimization: Monitor system performance and proactively identify areas for improvement in cost, efficiency, and reliability.
Security & Compliance: Ensure cloud environments adhere to security best practices and compliance requirements. Collaborate with security teams to implement controls and monitor risk.
Documentation & Knowledge Sharing: Create and maintain technical documentation. Mentor and train team members on GCP adoption and SRE practices.
Requirements
Bachelor's degree in Computer Science, Engineering, or a related field
3+ years of experience in cloud engineering, site reliability engineering, or DevOps, with hands-on expertise in GCP.
3+ years experience in Infrastructure as Code (IaC) tools (e.g., Terraform, Deployment Manager).
3+ years experience with monitoring, logging, and alerting tools (e.g.,Prometheus, Grafana).
3+ years experience designing and implementing CI/CD pipelines and automation workflows.
3+ years experience with troubleshooting and problem-solving skills, especially in distributed systems and cloud environments.
3+ years experience working with SRE principles, including error budgets, SLIs/SLOs, and incident management.
3+ years experience with cloud security best practices and regulatory compliance requirements.
Preferred Qualifications
Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams
Ability to work independently and multitask within a collaborative work environment
Willingness and aptitude for continuous improvement
Do the right thing attitude while being a strong team player
Strong communication and collaboration skills, focus on customer service
GCP Professional certifications
Experience migrating workloads from on-premises or other cloud platforms to GCP.
Familiarity with Kubernetes, Docker, and container orchestration in GCP.
Experience with agile methodologies and project management tools
Skills:
- Cloud Computing