AI User Experience Reliability Lead
Role details
Job location
Tech stack
Job description
-
Define and execute the strategy for measuring and improving the accuracy, stability, and task success of Qira's AI actions.
-
Build and evolve evaluation frameworks, behavioral scorecards, and quality validation for models, prompts, retrievers, and task orchestration.
-
Develop systems to detect hallucinations, regressions, safety deviations, and other behavioral anomalies in real time.
Safety & Guardrail Reliability
-
Ensure the reliability of runtime safety systems, including content moderation, jailbreak/misuse detection, safety classifiers, and policy enforcement.
-
Partner with Safety, Legal, Ethics, and Product teams to convert requirements into robust technical safety solutions.
-
Validate safety updates through testing, evaluation, and monitored deployment.
AI Observability & Telemetry
-
Define the telemetry, metrics, traces, and data needed to understand AI behavior endtoend across device, edge, and cloud.
-
Collaborate with observability and platform teams to integrate AIspecific signals (quality, drift, safety events) into a unified reliability platform.
-
Lead the creation of dashboards and analytics that provide deep insight into AI behavior and experience reliability.
Reliability Engineering & Architecture Influence
-
Partner with engineering, AI/ML, and product teams to embed reliability into the design of prompts, models, policies, and workflow orchestration.
-
Influence architecture to ensure AI behavior is predictable, testable, explainable, and resilient.
-
Establish standards for AI evaluation, rollout safety, service readiness, and runtime validation.
CrossFunctional Leadership
-
Represent AI experience reliability in architectural reviews, product decisions, roadmap development, and launch readiness.
-
Drive cross-team alignment on reliability metrics, evaluation methods, and monitoring strategies.
-
Collaborate with ML researchers, applied AI teams, data scientists, and UX to ensure usercentric reliability goals.
Execution & Delivery
-
Lead major engineering initiatives across AI quality, evaluation, safety assurance, and behavioral monitoring.
-
Set priorities, ensure accountability, and drive timely delivery of reliability systems and tooling.
-
Foster a culture of engineering excellence, learning, and continuous improvement.
Requirements
-
8+ years in AI/ML engineering, evaluation engineering, applied ML, reliability engineering, or large-scale distributed systems, with depth in AI behavior, evaluation, or safety.
-
Bachelor's Degree in Computer Science, Engineering, Machine Learning, or a related field.
-
Strong programming skills in Python (Go, Java, or C++ a plus).
-
Experience instrumenting, evaluating, or operating AI systems in production.
-
Deep understanding of LLMs, model behavior, evaluation methods, retrievalaugmented systems, or content moderation logic.
-
Strong ability to lead technical initiatives and influence crossfunctional engineering teams.
Preferred Qualifications
-
Experience with OpenTelemetry, Grafana, Prometheus, Loki, Tempo, or similar observability systems.
-
Hands-on experience with hallucination detection, behavioral anomaly detection, or evaluation frameworks at scale.
-
Experience in AI safety engineering, runtime validation, or policy enforcement systems.
-
Understanding of hybrid architectures (device + edge + cloud).
-
Background guiding teams or owning cross-functional architectural decisions.
-
A passion for building AI systems that are correct, safe, reliable, and deeply aligned with user expectations.