Senior Platform Engineer (Cloud Workloads)

Veeam Software Corporation
San Jose, United States of America
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 294K

Job location

San Jose, United States of America

Tech stack

Amazon Web Services (AWS)
JIRA
Azure
Bash
Software as a Service
Continuous Integration
Elasticsearch
Github
Python
Pattern Recognition
Powershell
Reliability Engineering
Cloud Services
Kusto Query Language
Salesforce
Pulumi
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
Infrastructure Automation Frameworks
Bicep
Cosmos DB
Azure
Kibana
Veeam
Terraform
Key Vault
ServiceNow

Job description

We are looking for a Senior Platform Engineer to join the Workload team within the Veeam R&D Department. You will own critical observability infrastructure, drive incident response maturity, and help scale proactive support capabilities as operational accountability., * Design, build, and maintain observability pipelines using the Elastic Stack (Elasticsearch, Kibana, Fleet) across Azure and AWS workloads

  • Develop and own SLO/SLI dashboards and error budget reporting for BaaS platform services
  • Respond to and lead incident response for distributed, multi-tenant cloud workloads; own runbook creation, maintenance, and continuous improvement
  • Build and refine proactive support tooling, including pattern analysis, tenant correlation dashboards, and baseline deviation alerting, to reduce reactive support burden
  • Manage and maintain Elastic Fleet agent policies, enrollment health, and log streaming pipelines across Azure and AWS worker fleets
  • Partner with SRE, R&D, and Proactive Support teams to close observability gaps, including tenant identification workflows and admin portal integrations

Technologies we work with

  • Elastic Stack - Elasticsearch, KibanaElastic Fleet, KQL, Query DSL
  • Azure Kubernetes Service (AKS), Azure Container Apps, VMs
  • Azure Security - Entra ID, Managed Identities (user/system assigned), App Registrations, Key Vault
  • Infrastructure as Code - Azure Bicep, Terraform, or Pulumi
  • CI/CD - Azure DevOps, GitHub Actions
  • ITSM tooling - ServiceNow, Salesforce, Jira, Incident.io (for tenant and incident workflows)

Requirements

  • 5+ years of experience in cloud platform engineering, SRE, or infrastructure roles supporting commercial SaaS products
  • Deep hands-on experience with Elastic Stack: Building dashboards, writing KQL/Query DSL, managing Fleet
  • Proven experience operating and troubleshooting distributed, multi-tenant workloads on Azure and/or AWS
  • Strong understanding of Azure cloud services: AKS, Entra ID, Key Vault, Service Bus, Cosmos DB, Private Endpoints, etc.
  • Experience with incident response in production cloud environments, including runbook development and post-incident review
  • Experience with IaC tools (Azure Bicep, Terraform) and CI/CD pipelines (Azure DevOps, GitHub Actions)
  • Strong scripting skills in Bash, Python, or PowerShell
  • Ability to work cross-functionally with SRE, product, and customer-facing support teams

About the company

Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world's biggest brands.

Apply for this position