Site Reliability Engineer (DevOps) - Netherlands
Role details
Job location
Tech stack
Job description
Mist AI is the AI-native networking solution from HPE Juniper Networking and our Software Engineering team is seeking a Site Reliability Engineer to join our talented team and build high quality technology solutions that revolutionize networking, powered by Artificial Intelligence in the cloud. Mist AI provides services through SaaS applications to many Fortune 100 and Fortune 500 customers. You will take ops projects from concept through to launch. You will be responsible for maintaining and improving the company's production environment for rapid scaling and outstanding performance. You will be responsible to help us keep stellar uptime and reliability. The improvements you implement will be felt by the entire organization. For you to be successful, you need to have a hunger to learn and adapt to new technology quickly. We demand people who are naturally curious, can self-start and share learnings and outcomes effectively with a distributed team. You need to be a builder at heart., * Express your passion about infrastructure as code and continuous deployment to build scalable and highly reliable systems.
- Define and own KPIs around system availability, quality and scale.
- Partner with our developers and quality engineering teams to automate the monitoring, alerting, availability and scalability of our applications and systems.
- Ensure system availability and business continuity by implementing redundant servers/services.
- Manage after-hours infrastructure updates and maintenance.
- Proactively research and propose the use of new concepts, processes, technologies, and tools.
- Partner with software developers to create Mist standards for Microservices (APIs, schemas, serialization, data stores and best practices)
- Run secure and scalable applications for highly available, multi-region, AWS and GCP deployments
- Ship code several times per week.
- Be a part of our On-Call rotation.
- Own disaster recovery and business continuity plans.
Requirements
Do you have experience in ZooKeeper?, * An extensive background in developing and operating large-scale cloud-based distributed applications.
- Direct experience developing/running applications on AWS or Google Cloud.
- Laser focus and be able to design infrastructure solutions for scalability, reliability, high availability, performance, security, software maintainability, and operational excellence.
- The ability to "fix the plane while in flight" (not just support greenfield solutions).
- The ability to prioritize existing technical and infrastructure debt, and experience to build and execute a plan to pay it off., * Delivering web-scale infrastructure for a global market at high release velocity.
- A deep understanding of distributed system design and dependency management.
- Must have solid experience with at least 2 of the languages: Go, Java, Python.
- 10+ years industry experience in managing infrastructure.
- 5 years Kubernetes administration in a large-scale SaaS environment.
- 5 years maintaining production systems on AWS or GCP.
- 3 years in implementing, managing, and monitoring metrics specific to SaaS applications.
- 3 years using infrastructure as code software (eg. Terraform, AWS and Google Cloud Deployment, CloudFormation).
- 5 years' experience in continuous integration practices & tools (Jenkins, Travis CI, CircleCI, etc…).
Desired skills
- Experience with Kafka, Spark, Storm, Cassandra, ElasticSearch, PostgreSQL, Redis, Zookeeper, Nginx, Airflow.
- Experience of working with or contributing directly to Open Source projects.
- Understanding and experience of leading/managing technology products.
- Understand machine learning techniques and tools. Translate business requirements into data models and implement them for scale and production ready systems.
- Experience of working with failure-based testing.
- Experience working in a test-driven development environment.
Personal skills
- Previous experience of contributing to war rooms and blameless postmortems.
- Superb communication skills, written and verbal.
- Experience of working in a true DevOps environment with daily collaborations.
- Thrives in a fast-paced startup environment where there may be multiple competing priorities.
- Customer-service mindset.
- Passion for improvement.
Additional Skills: Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Security-First Mindset, Solutions Design, Testing & Automation, User Experience (UX)