Staff Software Engineer
Role details
Job location
Tech stack
Job description
closely with product and infrastructure teams to enhance observability, load validation, performance visibility, and associated engineering practices. What You Will Work On Defining and evolving SLIs and SLOs for critical product journeys Improving and standardizing observability, dashboards, and service health visibility across teams Investigating bottlenecks and regressions across application, database, asynchronous, and system layers Driving improvements in latency, throughput, scalability, and reliability Building structured load testing workflows for critical paths Helping teams validate system behavior under realistic traffic, concurrency, and tenant-scale conditions Analyzing capacity, saturation, and behavior under peak load and growth scenarios Defining practices and tooling to prevent performance regressions before production Aligning on performance priorities and system behavior under load with product and infrastructure teams Designing AI-assisted workflows to support metric and
Requirements
About the Role We are looking for a Staff Software Engineer to join our DX and Performance team at Factorial, with a strong focus on performance, reliability, observability, load testing, and AI-assisted engineering workflows. Team & Mission The DX and Performance team's primary goal is to increase Factorial's quality, performance, and scalability by continuously improving the way we build our product. We strengthen our tools, maintain foundational elements, and promote best practices in close collaboration with the rest of the engineering organization. Our mission is to equip product builders with robust, AI-enabled tools and practices to deliver with quality, confidence, and efficiency. Role As a Staff Engineer, you'll work with a team of 200+ Engineers. We look for people who are curious, proactive, technically strong, and effective communicators. You will shape how Factorial defines, measures, and improves performance and service health across the engineering organization, working, alert interpretation, anomaly analysis, incident investigation, and performance insights generation Requirements Strong hands-on experience improving performance, scalability, and reliability in complex software systems Experience defining or operating SLIs, SLOs, and service health frameworks Strong knowledge of observability practices and tools such as Datadog Experience investigating production bottlenecks across application, database, and distributed system layers Experience building or improving load testing, benchmarking, or performance validation workflows Experience diagnosing tail latency, throughput issues, and performance variability in production Broad experience working with cloud-based production systems Strong communication skills, including technical writing and cross-team alignment A proactive mindset and strong ownership mentality Ideally Significant experience, building and operating production systems at scale Experience in large-scale environments with meaningful