Site Reliability Engineer

Trimble Inc.
Newcastle upon Tyne, United Kingdom
19 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Newcastle upon Tyne, United Kingdom

Tech stack

Amazon Web Services (AWS)
Azure
Cloud Computing
Computer Security
Computer Programming
Computer Engineering
Github
Python
Reliability Engineering
Newrelic
Prometheus
Datadog
Data Logging
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
Grafana
Containerization
Kubernetes
Machine Learning Operations
Sumo Logic
Cloudwatch
Terraform
Splunk
New Relic (SaaS)
Jenkins

Job description

Reliability Engineer with hands-on experience supporting multiple connected Cloud-based products? What You Will Do: We are seeking a skilled and motivated Site Reliability Engineer to join our team in Trimble's Project Delivery Cloud Platform and take responsibility for the infrastructure of our cutting-edge reality capture solution running on Microsoft Azure. The ideal candidate will have a strong background in cloud platforms, infrastructure as code, and automation via programming/scripting languages. You will work with a distributed team to drive the reliability, scalability, and security of the service and infrastructure., * Develop and maintain infrastructure as code (IaC) using Terraform to ensure reliable and scalable cloud environments;

  • Implement and enhance observability solutions using tools like New Relic, DataDog, Sumologic and Splunk for monitoring, logging, and alerting;
  • Perform code deployments and manage CI/CD pipelines using Jenkins, Github, and related tooling to ensure smooth and efficient delivery processes;
  • Automate routine tasks and workflows to increase operational efficiency and reduce manual intervention;
  • Evaluate system designs and architectures for reliability, performance, security, and efficiency, ensuring best practices are followed;
  • Lead incident response efforts, conduct root cause analysis, and implement long-term solutions for complex issues;
  • Develop and maintain comprehensive runbooks and procedures for incident response and operational tasks;
  • Collaborate with cross-functional teams to review and provide feedback on technical designs, ensuring alignment with SRE principles;
  • Participate in on-call rotations and handle critical incidents with confidence and expertise;
  • Continuously improve documentation for systems and services, contributing to a knowledge-sharing culture within the team., At Trimble, our core values of Belong, Grow, and Innovate aren't just words-they're the foundation of our culture. We foster an environment where you are seen, heard, and valued (Belong); where you have an opportunity to build a career and drive our collective growth (Grow); and where your innovative ideas shape the future (Innovate). We believe in empowering local teams to create impactful strategies, ensuring our global vision resonates with every individual. Become part of a team where your contributions truly matter. Trimble's Privacy Policy If you need assistance or would like to request an accommodation in connection with the application process, please contact AskPX@px.trimble.com.

Requirements

Do you have experience in Terraform?, Do you have a Master's degree?, Are you ready to take your skills to the next level as a self-motivated and enthusiastic Site, * Bachelor's or Master's degree in Computer Engineering or a related field;

  • At least 5 years of technical experience with a proven ability to take ownership;
  • Strong collaboration skills with leading cross-functional work;
  • Demonstrated success in managing infrastructure in production environments;
  • Expertise in capacity planning and cost optimisation for efficient operations;
  • Extensive experience with Cloud provider hosted infrastructure (Amazon Web Services & Azure);
  • Proficient in high-level scripting languages (Python) and Infrastructure as Code (IaC) tools (Terraform), along with containerisation;
  • Experience with Kubernetes or other containerisation technologies;
  • Familiarity with CI/CD pipelines and tools such as Azure DevOps, Jenkins, Argo CD, Helm, GitHub;
  • Experience with monitoring tools and incident management processes like;
  • Prometheus, Grafana, New Relic, DataDog, Splunk, Cloudwatch, Sumologic etc.
  • Strong understanding of networking and security concepts;

Additional experience preferred in:

  • SRE observability experience with NewRelic or Datadog;
  • OpenTelemetry;
  • AIOps/MLOps;
  • SecOps.

Apply for this position