Senior Platform Engineer
Role details
Job location
Tech stack
Job description
- Design, build, and manage scalable, secure, and resilient infrastructure on AWS using Terraform (modularized, reusable components).
- Implement configuration management solutions using Ansible, including playbook development, inventory structuring, and role-based automation.
- Manage secrets securely using services such as AWS Secrets Manager or HashiCorp Vault.
- Implement robust monitoring, alerting, and observability tooling (e.g., CloudWatch, Prometheus, Grafana, Datadog).
- Participate in incident response, root cause analysis, and resilience improvements.
- Maintain and evolve CI/CD pipelines using tools such as GitHub Actions, Bitbucket Pipelines, or Jenkins.
- Automate deployments for container-based workloads on ECS (Fargate), or Lambda, and manage supporting infrastructure.
- Collaborate with development teams to optimize build/deploy cycles and reduce lead time for changes.
- Ensure security best practices are embedded into infrastructure provisioning and pipeline execution.
- Support compliance and auditing by implementing guardrails and controls as code (e.g., AWS Config, SCPs, IAM policy management).
- Some OOH work is required to maintain the production systems and also will be part of OOH critical ticket rota.
Requirements
We are seeking a Senior Platform Engineer with a deep understanding of AWS cloud infrastructure, Infrastructure-as-Code (IaC) tooling such as Terraform, and configuration management using Ansible. The ideal candidate will be a self-starter, passionate about Site Reliability Engineering (SRE) principles, and thrive in collaborative environments.
You will play a pivotal role in automating infrastructure, improving reliability and scalability, and ensuring smooth CI/CD pipelines across multiple environments. You'll work closely with software engineering, and security teams to drive platform excellence., * 5+ years in DevOps, SRE, or Cloud Engineering roles.
- Expertise in AWS core services: EC2, IAM, VPC, ECS/Fargate, CloudFormation, CloudWatch, RDS, DynamoDB, S3, Lambda.
- Strong proficiency in Terraform (HCL) - including workspaces, modules, and Terraform Cloud or similar.
- Ansible experience - developing roles, dynamic inventories, managing remote configurations.
- Strong scripting knowledge (Bash, Python, or Go).
- Experience with container orchestration and deployment (Docker, ECS, or Kubernetes).
- Proficient with GitOps or IaC-based workflows.
- Familiarity with Google SRE practices, particularly around reliability, observability, and operational excellence.
- Understanding of systems reliability metrics and associated tooling
Soft Skills & Behaviours
- Self-driven with a bias toward action and ownership.
- Excellent communicator, able to collaborate across disciplines and levels of technical understanding.
- Experience working as part of a cross-functional team.
- Comfortable working in agile environments (Scrum/Kanban).