Member of Technical Staff, Microsoft Robotics (Software Systems)
Role details
Job location
Tech stack
Job description
- Design, build, and operate the observability and monitoring infrastructure for the Microsoft Robotics platform, including telemetry pipelines, distributed tracing, alerting, dashboards, and health models that span cloud services on Azure and edge/on-robot components running in partner environments.
- Instantiate the core incident response and reliability capabilities for production robotics workloads, to include defining Service Level Indicators (SLIs)/Service Level Objectives (SLOs), building automated detection and remediation, conducting post-incident reviews, and driving systemic improvements that prevent recurrence across the fleet.
- Engineer production-grade deployment and release pipelines for robotics software, including safe rollout strategies for edge/on-robot updates, canary deployments, rollback automation, and stage-gated release processes that enforce safety and quality checks before software reaches physical systems.
- Build and maintain the secure-by-design infrastructure for cloud-to-edge communication, including certificate management, secure boot chains, encrypted telemetry channels, and access controls for remotely managed robotic systems.
- Partner with platform, autonomy, and simulation engineers to instrument new capabilities with production-quality logging, metrics, and tracing from day one, embedding operational readiness into the development lifecycle rather than retrofitting it.
- Develop capacity planning models and performance baselines for robotics workloads, identifying scaling bottlenecks in data ingestion, model inference, simulation execution, and real-time control loops before they impact partner deployments.
- Contribute to eventual on-call rotations and build the runbooks, escalation paths, and operational documentation that enable the broader team to support production systems confidently.
Requirements
-
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Python
-
OR equivalent experience.
Other Requirements:
- Abilityto meet Microsoft,customerand/or government security screening requirements arerequiredfor this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will berequiredto pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter., * Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Python
- OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Python
- OR equivalent experience.
3+ years technical experience working with large-scale cloud or distributed systems.
4+ years of hands-on experience operating and debugging distributed systems in production, including cloud-native services (Azure, Amazon Web Services (AWS), or Google Cloud Platform (GCP)), containerized workloads (Kubernetes, Docker), and continuous integration/continuous deployment (CI/CD) pipelines.
Experience with edge computing, Internet of Things (IoT), or embedded systems in production - particularly systems where cloud services coordinate with on-device software running on constrained or physically deployed hardware.
Proficiency in at least one systems-level language (Go, C++, Rust) and one scripting language (Python, Bash), with experience building monitoring, automation, and tooling for production environments.
Experience defining and operating against Service Level Indicators (SLIs)/Service Level Objectives (SLOs), building alerting and dashboards (Prometheus, Grafana, Azure Monitor, or equivalent), and leading incident response processes in on-call environments.
Demonstrated ability to work across the stack - from cloud infrastructure and networking to application-level telemetry and on-device diagnostics - to identify and resolve production issues under time pressure.
Experience with fleet management at scale - including over-the-air (OTA) update systems, device lifecycle management, and remote diagnostics for distributed hardware deployments.
Knowledge of security engineering for IoT/edge systems, including secure boot, device attestation, certificate rotation, and encrypted communication channels.
Experience with Azure-specific services (Azure IoT Hub, Azure Arc, Azure Monitor, Azure Kubernetes Service (AKS)) and their application to hybrid cloud-edge architectures.
Familiarity with robotics systems, Robot Operating System (ROS)/Robot Operating System 2 (ROS2), real-time operating systems, or autonomous vehicle infrastructure, including the unique reliability challenges of software controlling physical actuators.
Prior work in industries with high-consequence software failures (robotics, autonomous vehicles, medical devices, aerospace) where reliability engineering directly impacts physical safety.
Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800.00 - $234,700.00 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $160,200.00 - $261,000.00 per year.
About the company
Microsoft is a global technology company headquartered in Redmond, Washington. Our mission is to empower every person and every organization on the planet to achieve more. We develop, license, and support a wide range of software products, services, and devices that help individuals and businesses realize their full potential.
Our flagship products include the Microsoft 365 productivity cloud, Windows operating system, Azure cloud platform, and Dynamics 365 business applications. We are also a leader in areas such as artificial intelligence, cybersecurity, developer tools, and gaming through Xbox and Game Pass.
With operations in more than 190 countries and over 220,000 employees worldwide, Microsoft is committed to responsible innovation, inclusive economic growth, and sustainability. We work closely with governments, industries, and communities to ensure that technology serves the public good and helps address some of the world’s most pressing challenges.
As we celebrate our 50th anniversary in 2025, we continue to look forward—investing in AI, cloud, and quantum computing to shape the future of work, education, and society at large scale.