Senior Software Engineer - Agentic Runtime Safety & Observability

Keysight Technologies

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Tech stack

API

Artificial Intelligence

Systems Engineering

C++

Distributed Systems

Fault Tolerance

Interoperability

Python

ZeroMQ (Concurrent Programming Libraries)

Multi-Agent Systems

gRPC

Job description

As a Senior Engineer - Agentic Runtime Safety, Stability & Observability, you will design and own the runtime safety and reliability layer of Keysight's agentic orchestration platform.

Your mission is to ensure that AI-driven orchestration remains aligned with human intent, observable, auditable, and recoverable. You will architect guardrails, rollback mechanisms, and observability pipelines that allow autonomous systems to act powerfully-without sacrificing trust, control, or predictability.

This role bridges AI systems, runtime engineering, and safety-critical design, working closely with AI architects, ML engineers, and simulation teams., Runtime Safety & Execution Control

Design runtime guardrails ensuring agent actions remain aligned with intent, policies, and system constraints.
Implement intent validation, semantic checks, and execution contracts before orchestration runs.
Define safety boundaries, escalation paths, and rollback conditions within agent workflows. Fault Isolation, Rollback & Recovery
Architect deterministic rollback, checkpointing, and recovery mechanisms for multi-agent systems.
Design fault-isolation boundaries to prevent local failures from cascading system-wide.
Build sandboxed execution environments for validating AI-generated orchestration logic. Observability & Diagnostics
Implement end-to-end observability capturing agent decisions, execution traces, and system health.
Develop anomaly detection and confidence-based safety gating for runtime behavior.
Build introspection APIs and dashboards exposing rationale, safety metrics, and performance signals. Adaptive Governance
Establish feedback loops that adjust orchestration behavior based on performance and safety signals.
Contribute to continuous safety validation and runtime certification pipelines.
Collaborate across teams to embed transparency and traceability into every orchestration cycle.

Requirements

PhD or 5+ years of experience in systems engineering, runtime reliability, or safety-critical software.
Strong proficiency in Python and C/C++.
Proven experience designing fault-tolerant, observable, and recoverable systems.
Hands-on experience with agentic orchestration frameworks (e.g., LangGraph, LangChain, or similar).
Solid understanding of execution control, intent alignment, and policy enforcement in automated systems.
Experience building telemetry, monitoring, or diagnostics pipelines in complex runtimes. Desired Qualifications
Background in safety-critical or regulated domains (e.g. aerospace, industrial systems, EDA, HPC).
Experience with semantic validation, policy modeling, or goal disambiguation.
Familiarity with rollback strategies, dynamic gating, or safety scoring in distributed systems.
Experience with Python/C++ interoperability (e.g. PyBind11, gRPC, ZeroMQ).
Exposure to simulation-driven systems or hybrid AI-physics environments.

About the company

Keysight is at the forefront of technology innovation, delivering breakthroughs and trusted insights in electronic design, simulation, prototyping, test, manufacturing, and optimization. Our ~15, employees create world-class solutions in communications, 5G, automotive, energy, quantum, aerospace, defense, and semiconductor markets for customers in over countries. Learn more about what we do. Our award-winningculture embraces a bold vision of where technology can take us and a passion for tackling challenging problems with industry-first solutions. We believe that when people feel a sense of belonging, they can be more creative, innovative, and thrive at all points in their careers. About the Team Keysight's Applied AI Autonomy Initiative is building a next-generation agentic orchestration framework that enables AI agents to reason, adapt, and coordinate across complex engineering workflows. The platform combines LLM-based reasoning, reinforcement-inspired feedback loops, and simulation-driven validation to automate and optimize engineering decisions at scale. This role sits at the core of the initiative, defining how autonomy can be deployed safely, transparently, and predictably in high-assurance engineering environments.