Senior Infrastructure Engineer
Role details
Job location
Tech stack
Job description
The Senior Infrastructure Engineer designs and operates the systems that power Graswald's platform. This role focuses on reliability, scalability, security, and cost efficiency, while enabling product teams to move quickly and safely. As a senior member of the team, you'll own core infrastructure architecture, make long-term technical trade-offs, and mentor others through example., Infrastructure Design & Development: Design, build, and maintain scalable, resilient, and secure infrastructure systems. Implement automation and Infrastructure-as-Code (IaC) practices to ensure consistency, reliability, and maintainability of environments.
- Technical Contribution & Operational Excellence: Actively contribute to the architecture, deployment, and ongoing improvement of cloud infrastructure and platform services. Perform rigorous peer reviews of infrastructure code, CI/CD pipelines, and system configurations to uphold quality, efficiency, and adherence to best practices.
- Reliability & Operations: Own the stability, performance, and observability of production systems. Lead incident response, root cause analysis, and long-term improvements to prevent recurrence. Help defining a sustainable on-call culture.
- Performance & Cost Optimization: Regularly review resource usage and optimize infrastructure for performance and cost efficiency. Propose architectural improvements where needed.
- Collaboration & Enablement: Partner closely with product and engineering teams to design reliable infrastructure solutions, participate in architectural discussions and postmortems, and provide guidance on best practices for scalability, cost optimization and security.
- Continuous Learning & AI-Driven Operations: Stay current with evolving cloud, DevOps, and infrastructure technologies. Explore and apply AI-driven capabilities in areas like monitoring, incident detection, and automated remediation to enhance operational excellence and productivity. Experiment with and champion modern practices to drive innovation within the infrastructure team.
- Documentation & Knowledge Sharing: Create and maintain clear, comprehensive documentation for infrastructure designs, operational runbooks, and processes. Ensure that knowledge is easily accessible for current and future team members, reducing operational risk and onboarding time.
- Security and Compliance: Implement and enforce security controls, access management policies, and compliance requirements across infrastructure environments.
Requirements
Do you have experience in Terraform?, Do you have a Bachelor's degree?, + Several years of professional experience in infrastructure engineering, DevOps, or site reliability engineering (SRE) roles.
- Experience operating within agile software development teams and modern DevOps practices.
- Bachelor's degree in Computer Science, Engineering, or equivalent professional experience.
- Technical Expertise:
- Extensive hands-on experience with at least one of the cloud providers AWS or GCP.
- Proven ability to design and implement Infrastructure-as-Code (IaC) using tools such as Terraform.
- Proficiency in scripting and automation (e.g., Python, Bash, Go) to streamline operations and reduce manual tasks.
- Solid understanding of Linux systems, containerization (Docker), and orchestration platforms (Kubernetes, ECS, or similar).
- Nice to Have
- Experience operating ML inference or training infrastructure at scale.
- Familiarity with MLOps tooling (SageMaker, Vertex AI, Kubeflow, MLflow, Argo Workflows)
- Operational Excellence:
- Experience building and operating highly available, reliable, and scalable systems in production environments.
- Strong background in monitoring, observability, and incident response, with tools such as Prometheus, Grafana, Datadog, ELK, or similar.
- Knowledge of security best practices, including identity and access management, secrets management, compliance, and secure system design.
- Collaboration & Leadership:
- Demonstrated ability to work effectively in cross-functional teams, partnering with product engineers, security, and data teams.
- Strong communication skills with the ability to explain complex technical concepts clearly to both technical and non-technical audiences.
- Problem-Solving & Adaptability:
- Track record of diagnosing and resolving complex infrastructure issues under pressure.
- Ability to balance short-term fixes and long-term architectural improvements.
- Proactive and curious mindset, with a drive for continuous improvement and innovation.