Senior Reliability Engineer
Role details
Job location
Tech stack
Job description
We are seeking an experienced engineer with broad, end-to-end software development experience, including operating applications in a microservices environment in production at scale. This role goes beyond feature implementation - it requires someone who can design, build, and support resilient systems from the ground up.
As a Senior Reliability Engineer at Vanguard, you will play a critical role in solving impactful operational problems. You are curious and take a proactive approach to identifying problems and making improvements. You balance innovative thinking with pragmatism and understand the long-term impacts of technical decisions. You communicate complex ideas clearly and collaborate effectively to deliver scalable solutions.
Core Responsibilities
- Improve resiliency engineering practices across platforms and applications, including r esilient application design patterns, s ystem observability and d eployment strategies
- Incident detection, troubleshooting, and resolution.
- Develop automation for incident response and infrastructure management
- Develop and support OpenTelemetry integrations for multiple application platforms (browser, ECS, lambda, etc) and languages (JavaScript, Java)
- Contribute to architectural decisions and support implementation of solutions.
Requirements
- Expertise in JavaScript (server-side and client-side execution environments) or Java.
- Working knowledge of Python (or similar scripting language)
- Strong knowledge of resiliency engineering techniques for both platforms and applications.
- Experience troubleshooting complex production issues and implementing effective mitigations.
- Hands-on experience with AWS services and cloud infrastructure.
- Familiarity with OpenTelemetry specification and core APIs.
- Practical experience developing and operating software in distributed systems environments.