Staff Site Reliability Engineer - Cloud
Role details
Job location
Tech stack
Job description
Are you ready to lead an OTel-first strategy and redefine reliability for a global industrial technology leader? Trimble is looking for a visionary Cloud Site Reliability Engineer to manage our massive-scale observability platform, ensuring our digital and physical solutions remain performant and resilient. This is your chance to use cutting-edge automation and OpenTelemetry to make a tangible impact on the world's most critical industries., T&L: In the Transportation & Logistics segment, our solutions make it safer, simpler and more efficient to move freight-bringing together a global network of shippers, carriers, brokers and 3PLs. Field Systems: The Trimble Field Systems segment provides solutions to increase precision and productivity in construction tasks by empowering stakeholders to collect accurate information and manage conditions with cutting-edge technology. Corporate: Trimble empowers customers to drive productivity and progress with connected hardware and software solutions. What Makes This Role Great: In this role, you will be the primary architect of our Observability Centre of Excellence, directly influencing the reliability and uptime of global platforms that keep world industries moving. Key Exciting Responsibilities:
- Lead a global "OTel First" strategy, implementing OpenTelemetry at scale across a diverse technological landscape.
- Spearhead the development of automation scripts and Infrastructure as Code using Terraform to ensure seamless, reproducible platform delivery.
- Optimize platform performance and cost-efficiency, ensuring our observability tools scale economically as our data grows.
- Collaborate with engineering teams to embed reliability and security standards into new features from the ground up.
- Drive root cause analysis and problem management to proactively prevent incidents and improve the customer experience.
Requirements
Do you have experience in Terraform?, * Hands-on experience with the OpenTelemetry Collector, APIs, and SDKs.
- Extensive experience with observability tools like NewRelic, Datadog, or Splunk.
- Strong proficiency in Infrastructure as Code (Terraform, Ansible) and cloud platforms (AWS, GCP, or Azure).
- Deep understanding of containerization and orchestration using Docker and Kubernetes.
- Advanced coding skills in Python, Go, or Java for building robust automation and monitoring tools.
Bonus Points For:
- Experience leveraging AI coding assistants like GitHub Co-Pilot to accelerate development.