Site Reliability Engineer
Role details
Job location
Tech stack
Job description
You will be a senior technical contributor on the SRE team, responsible for the reliability, scalability, and security of Prosper's Cloud Platform portfolio. This is as much of a platform engineering role as it is SRE role - you will maintain the applications that run on our platform, drive alignment to platform standards, and ensure services stay current within the framework and dependency realm.We are building an agentic AI-first operations model where AI agents handle investigations, deployments, audits, and optimizations - and you will be at the center of designing and governing that system. You will share the ownership of application-layer reliability, CI/CD pipelines, and observability while simultaneously building the skills, rules, and guardrails that allow AI agents to operate safely alongside human engineers., * Design and author AI agent skills - structured playbooks that encode investigation, deployment, and optimization workflows
- Own application-layer reliability within Kubernetes-based compute (managed by the Infrastructure Engineering team) across all environments
- Maintain and upgrade platform applications - drive framework upgrades, dependency updates, and alignment to platform standards
- Drive infrastructure-as-code with modular, multi-environment patterns
- Participate in on-call rotation and lead incident response
- Build and maintain observability across cloud monitoring and APM platforms
- Own the Internal Developer Platform - CI/CD pipelines, deployment tooling, and developer self-service
- Mentor junior SRE engineers and shape team standards, We are growing our Technology team to support our various financial products. The ideal candidate is passionate about learning the Fintech domain and delivering cutting-edge, high-quality solutions to solve business problems. We utilize a progressive, test-driven, Agile development methodology that places a high premium on communication, teamwork, sound design and clean implementation.
Requirements
- 7+ years in SRE, DevOps, or Platform Engineering
- Bachelor's degree in a technical field, or equivalent work experience
- Deep expertise with a major cloud provider (GCP preferred) and Kubernetes
- Strong infrastructure-as-code experience with multi-environment patterns
- Production CI/CD pipeline design
- Observability and APM platform experience
- Strong written communication - your documentation will be consumed by humans and AI agents alike
Extra credit
- Experience building or integrating AI agents into operational workflows
- Hands-on with LLM-powered development tooling
- Background in designing guardrails or policy engines for automated systems
- 2Track record of building internal developer platforms or self-service infrastructure
Benefits & conditions
- A hybrid connection: We believe in the "best of both worlds" - a hybrid environment (2 days/week in our SF office) that balances high-touch collaboration with the flexibility of remote work
- Invested in your future: A competitive salary and a 401(k) with a 5% company match to help you build long-term financial security
- Holistic well-being: We provide the resources you need to thrive, from flexible time off and paid parental leave to an annual wellness allowance and comprehensive health coverage
- Professional & personal growth: Take advantage of a suite of premium perks, including Udemy access, childcare assistance, pet insurance, and a bevy of additional savings through Beneplace
$163,000 - $203,000 a year The salary for this position is $163,000 - $203,000 annually, plus bonus and generous benefits. In determining your salary, we will consider your location, experience, and other job-related factors.