Site Reliability / Gitops Engineer
Role details
Job location
Tech stack
Job description
Working On The Site Reliability Team, You'll Help Build And Maintain World-class Infrastructure To Meet The Needs Of Millions Of Users Protecting Their Privacy Online. You'll Utilize High-level Languages Like Perl, Go, Or Python And Work On Related Projects. Recent Projects Include
- Preparing Duck.ai image uploads for production
- Reduce user impact of instances serving errors to users
As a Site Reliability Engineer, you'll dive deep into complex operational challenges, including software, systems, automation, and process analysis. We are looking for candidates who can read, write, troubleshoot, and deploy all types of software to help us tackle the reliability challenges of large-scale deployments., * You'll be required to attend meetings on camera via video conferencing.
- Expect to travel at least twice a year : once for our all-hands meetup and again for a team retreat (each around 4-5 days). While extenuating circumstances may impact attendance, everyone is strongly encouraged to attend.
- While we offer a flexible work arrangement with no core hours, expect an average full-time commitment of 40 hours per week.
- A successful candidate must pass a background check as a condition of joining the team.
- By applying for this role, you confirm that all information submitted is accurate and complete. Providing false or fraudulent information during the application process may result in denial of an offer, revocation of any existing offer, or other adverse actions, including termination after starting work. Seniority level Seniority level
Requirements
- 7+ years relevant professional experience in reliability, platform, infrastructure, or software engineering.
- Experience participating in a 24x7 on-call rotation for a large-scale deployment.
- Ability to lead and collaborate on high-impact and complex projects from proposal through post-mortem.
- Skills to wrangle vague problems, propose innovative solutions, and execute them with a strong focus on metrics.
- Experience developing effective tools, services, alerts and responses to identify and address reliability risks.
- Investigative ability to root-cause sources of instability in high-traffic, distributed systems.
- Deep experience administering and troubleshooting Linux and web technologies.
- Ability to implement automation around infrastructure provisioning and configuration management to prioritize efficiency, scalability and reliability.
- Foresight to help identify the future technical direction of our deployment with an effort to improve reliability and performance.
- Advanced programming skills enabling close partnership with software engineers to triage production issues and identify appropriate remediation, including code changes and performance considerations.
- Ability to leverage cloud-native services and architectures to enhance reliability and scalability, with hands-on experience packaging and deploying applications using Docker and Docker Compose.
Benefits & conditions
$178,500 USD annually and stock options. Compensation is identical within professional levels, regardless of geographic location or team. Compensation for each professional level is transparent across the organization. Our Team Member Support Guide explains how we prioritize your wellbeing, including paid parental leave, office setup, and co-working allowances. Hiring Process Hiring works best when it's a two-way street. Learn how we help you get to know DuckDuckGo, envision your future role here, and find out more about how we hire. Diversity, Equity, and Inclusion DuckDuckGo provides equal work opportunities to all team members and applicants, prohibiting discrimination and harassment of any type based on race, color, ethnicity, caste, religion, age, sex (including pregnancy), national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by our policies or laws.