Senior Site Reliability Engineer - Azure

Delta Capita
30 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 67K

Job location

Tech stack

Java
Agile Methodologies
Amazon Web Services (AWS)
Application Performance Management
Azure
Cloud Computing
Cloud Engineering
Databases
Continuous Delivery
Continuous Integration
Data Security
DevOps
Amazon DynamoDB
Identity and Access Management
Python
MongoDB
OAuth
Open Source Technology
OpenID
Performance Tuning
Reliability Engineering
Openid Connect
Prometheus
Security Assertion Markup Language (SAML)
Single Sign-On
Software Engineering
SQL Databases
TypeScript
Management of Software Versions
Amazon Web Services (AWS)
Data Logging
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Load Balancing
System Availability
Grafana
Reliability of Systems
Infrastructure as Code (IaC)
GIT
Cloudformation
Containerization
Kubernetes
Information Technology
Azure
REST
Terraform
Software Version Control
ELK

Job description

We are seeking a highly skilled and motivated Senior Site Reliability Engineer (SRE) to join our engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our services. We are looking for a self-starter with an open-minded attitude-someone who approaches challenges thoughtfully and strategically. You will collaborate closely with development teams to design and implement robust infrastructure solutions utilizing Azure, GCP and AWS and containerized technologies.

The Role and Responsibilities:

  • Cloud Infrastructure Management: Design, implement, and manage cloud infrastructure in Azure and AWS ensuring alignment with best practices and organizational standards.

  • Infrastructure as Code (IaC): Utilize Terraform (HCL), AWS CDK, and AWS CloudFormation for scalable and maintainable IaC, enabling safe and efficient infrastructure builds, changes, and versioning.

  • Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance.

  • Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability and rapid incident response.

  • Security and IAM: Implement security best practices, managing Identity and Access Management (IAM) policies across cloud environments. Utilize technologies such as OpenID Connect (OIDC), OAuth2, and SAML Single Sign-On (SSO) to ensure secure authentication and authorization across services.

  • Database Technologies: Manage and optimize database systems, including SQL databases and Mongo DB, ensuring high availability, performance tuning, and data security.

  • CI/CD Practices: Automate manual processes to enhance operational efficiency, employing Continuous Integration/Continuous Deployment (CI/CD) best practices for efficient code deployment.

  • Scripting Languages: Demonstrate proficient scripting skills in languages such as Java, TypeScript, and Python to automate tasks and manage configurations.

  • Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability.

  • Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms.

  • Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying root causes and recommending improvements to enhance system reliability and performance.

  • Mentorship: Mentor and guide junior engineers, fostering a culture of knowledge sharing and continuous improvement within the engineering team.

Requirements

  • Bachelor's degree in computer science, Engineering, or equivalent practical experience.

  • Proven work experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role within a high-availability environment.

  • Strong experience with Azure with GCP and AWS cloud services, including a deep understanding of cloud architecture and services.

  • Expertise in Infrastructure as Code (IaC) using Terraform (HCL) and AWS CloudFormation.

  • Experience with AWS CDK for programmatic management of cloud resources, primarily using TypeScript.

  • Hands-on experience with container orchestration technologies, particularly Kubernetes.

  • Familiarity with version control systems (e.g., Git) and CI/CD pipelines for efficient code deployment.

  • Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to ensure system observability.

  • Strong experience with SQL databases and AWS DynamoDB, focusing on performance tuning and optimization.

  • Proven ability to design and manage RESTful APIs, ensuring their reliability and scalability.

  • Excellent troubleshooting skills, with a proactive approach to resolving complex technical issues.

  • Strong communication and teamwork skills, enabling effective collaboration across cross-functional teams.

  • A curious and open-minded attitude, committed to challenging the status quo and exploring innovative solutions., * Experience with networking concepts and troubleshooting in cloud environments.

  • Knowledge of security best practices in cloud computing.

  • Contributions to open-source projects or the creation of technical articles/blog posts to share knowledge with the community.

  • Familiarity with service mesh technologies.

  • Exposure to Agile methodologies and project management tools.

  • Financial serviced domain knowledge.

About the company

Delta Capita Group (a member of the Prytek Group) is a global managed services, consulting and solutions provider with a unique combination of experience in Financial Services and technology innovation capability. Our mission is to reinvent the financial services value chain providing technology based mutualized services for financial institutions for non-differentiating services.

Apply for this position