Senior Cloud Infrastructure Engineer
Role details
Job location
Tech stack
Job description
Join the forefront of innovation as a Senior Cloud Infrastructure Engineer at Learning Pool. You won't just be keeping pace with technology, you'll be setting the pace.
Imagine being at the epicentre of innovation, deploying state-of-the-art cloud-based solutions on AWS that power our online learning platforms.
A core part of this role is working on our Kubernetes and Terraform estate. You will be will provision, deploy, and operate infrastructure using Infrastructure as Code and container orchestration. You'll also play a key role in designing and implementing cloud architectures that support cutting-edge Generative AI applications. Your expertise will help shape the future of AI-driven learning solutions, ensuring they run efficiently and securely in a scalable cloud environment.
Your technical acumen, problem-solving abilities, and communication skills will be key to the continuous growth of our world-class platforms, supporting millions of users across the globe. As a member of our agile and dynamic team, you'll provide guidance to product owners, engineers and key stakeholders. Your technical mastery will inform the design, automation, and continuous improvement of our platforms, processes, and developer tooling.
For more information about our benefits and why you should join learning pool, read more here:https://learningpool.com/why-work-for-learning-pool/ What you will be doing
Roles and responsibilities include:
- Innovate and Build: Spearheaded the development and implementation of cutting-edge cloud technologies by pioneering next-generation cloud solutions that will revolutionize online learning.
- Strategic Planning: Contribute to the planning and execution of cloud infrastructure projects, ensuring they align with the company's long-term objectives. Collaborate with the team to balance business requirements with technological innovation, cost, and scalability.
- Knowledge Sharing: Guide junior engineers by sharing your expertise and best practices and engaging in knowledge-sharing sessions. Foster a culture of continuous learning within the team
- Technical Oversight: Working as part of a team to ensure that our cloud infrastructure is secure, scalable, and efficient. Serve as a reliable resource for best practices in Infrastructure as Code (IaC), automation, and orchestration, collaborating closely with other engineers.
- Proactive Monitoring: Utilize monitoring tools like Datadog, Newrelic, and CloudWatch to proactively identify infrastructure needs, orchestrate responses and provide data-driven insights into our applications.
- Enhance and Deploy: Oversee upgrades and deployments, making strategic decisions to ensure a flawless user experience while minimizing risk.
- Collaborate and Troubleshoot: Work in cross-functional teams in troubleshooting complex cloud infrastructure and application issues.
- Automate and Document: Author comprehensive design, deployment, and troubleshooting documentation, focusing on automation and efficiency.
- Stay Engaged: Help play a key role in technological decisions and stay ahead of industry trends through attending vendor-led workshops, webinars and training.
- On-Call Duties: Participate in an on-call rota to ensure seamless 24/7 operational support.
Requirements
- Strong AWS experience in production environments
- Kubernetes experience
- Practical experience running workloads on Kubernetes in production
- Strong understanding of EKS, cluster operations, networking, and upgrades
- Terraform expertise:
- Proven, hands-on experience writing, reviewing, and maintaining Terraform
- Comfortable with modules, state management, environments, and change control
- Solid experience with containerisation (Docker) and container platforms (Kubernetes, ECS, EKS)
- Experience supporting Linux-based systems in cloud environments
- ArgoCD or GitOps tooling for Kubernetes deployments
- Strong understanding of cloud security concepts, including IAM and least-privilege access
- Experience with monitoring and alerting using tools such as Datadog, CloudWatch, Prometheus and Grafana
- Ability to troubleshoot across infrastructure and application boundaries
- Clear written and verbal communication skills
While not required, it would be advantageous to have experience with the following:
- Working with and experience of supporting Amazon OpenSearch clusters
- Experience with CI/CD pipelines for infrastructure and platform workloads
- Ansible or similar configuration management tooling
- Serverless technologies such as Lambda and API Gateway
- Experience managing cloud costs and improving cost efficiency
- Working in Agile delivery environments
Working at Learning Pool
The Learning Pool team is filled with people who have a real passion for what they do and a fresh approach to partnering with customers.
Benefits & conditions
- Contract
- Published: 18 hours ago
- Competitive