Staff Software Engineer
Role details
Job location
Tech stack
Requirements
Overview Your role as a Staff Software Engineer on Factorial's DX and Performance team focuses on performance, reliability, observability, load testing, and AI-assisted engineering workflows across our product and infrastructure. Responsibilities Define and evolve SLIs and SLOs for critical product journeys. Improve and standardize observability, dashboards, and service-health visibility across teams. Investigate bottlenecks and regressions across application, database, asynchronous, and system layers. Drive improvements in latency, throughput, scalability, and reliability. Build structured load-testing workflows for critical paths. Help teams validate system behavior under realistic traffic, concurrency, and tenant-scale conditions. Analyze capacity, saturation, and behavior under peak load and growth scenarios. Define practices and tooling to prevent performance regressions before production. Work closely with product and infrastructure teams to align on performance priorities and system behavior under load. Design AI-assisted workflows to support metric and alert interpretation, anomaly analysis, incident investigation, performance insights generation, and more. Qualifications Strong hands-on experience improving performance, scalability, and reliability in complex software systems. Experience defining or operating SLIs, SLOs, and service-health frameworks. Strong knowledge of observability practices and tools such as Datadog. Experience investigating production bottlenecks across application, database, and distributed system layers. Experience building or improving load-testing, benchmarking, or performance validation workflows. Experience diagnosing tail-latency, throughput issues, and performance variability in production. Broad experience working with cloud-based production systems. Strong communication skills, including technical writing and cross-team alignment. Proactive mindset and strong ownership mentality. Preferred Experience Significant experience building and operating production systems at scale. Experience working in large-scale environments with meaningful traffic and operational complexity. Experience with Ruby on Rails, MySQL, Kafka, GraphQL, ClickHouse, or equivalent technologies. Previous experience in Performance Engineering or Reliability Engineering. Interest in modern AI tools and practical use of agentic workflows in engineering. Benefits High-growth, multicultural, and friendly environment. Private health insurance (Alan). Wellness program with gym, pool, and outdoor classes (Wellhub). Performance-based bonuses and equity. Paid parental leave and flexible working arrangements. Commitment to equal opportunities and workplace inclusion of people with disabilities.