Infrastructure Engineer

NovusMinds AI
Brisbane, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Brisbane, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Audit Trail
Automation of Tests
Azure
Backup Devices
Bash
Batch Processing
Cloud Computing
Cloud Engineering
Continuous Integration
Data Infrastructure
Disaster Recovery
DNS
Failover
Github
Identity and Access Management
Subnetting
Virtual Private Networks (VPN)
Python
Key Management
PostgreSQL
Machine Learning
SQL Azure
Networking Basics
Network Segmentation
PCI Data Security Standards
Peering
Powershell
Role-Based Access Control
Azure
Azure
Runbook
AI Infrastructure
Datadog
Pulumi
Computer Networking Systems
Software Modules
Load Balancing
Cloud Monitoring
Istio
Grafana
Kubernetes Helm Charts
Multi-Cloud
Firewalls (Computer Science)
Amazon Web Services (AWS)
Cloudformation
Amazon Web Services (AWS)
Containerization
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Bicep
Hashicorp
Cosmos DB
Azure
Route53
Hardware Infrastructure
Functional Programming
Cloudwatch
Terraform
Serverless Computing
Amazon Web Services (AWS)
Docker
Pagerduty
Key Vault
Go

Job description

We are looking for an Infrastructure Engineer with deep Infrastructure as Code expertise and hands-on experience across both AWS and Azure to build and own the cloud foundation that powers our WealthOS platform. You will design, provision, and manage the multi-cloud infrastructure that our engineering team builds on every day.

This is a foundational role. You will establish our infrastructure practices, tooling, and architecture from the ground up, making decisions that will define how NovusMinds operates for years to come. You will work closely with the CTO and engineering team, with high autonomy and direct impact on every part of the platform.

What You Will Do

Infrastructure as Code & Cloud Architecture

  • Design and implement the entire cloud infrastructure using Infrastructure as Code (Terraform preferred), ensuring every resource is version-controlled, reproducible, and auditable.
  • Architect and manage multi-cloud environments across AWS and Azure, making strategic decisions about which services and regions to use for different workloads.
  • Build reusable Terraform modules, enforce coding standards, and implement automated plan/apply workflows with proper state management (remote backends, state locking, workspaces).
  • Implement landing zone architectures on both AWS (Control Tower, Organizations, SCPs) and Azure (Management Groups, Azure Policy, Blueprints) for secure, scalable account/subscription management.
  • Manage networking infrastructure: VPCs, VNets, subnets, peering, VPNs, private endpoints, DNS, load balancers, and CDN configurations across both clouds.

Compute, Containers & AI Infrastructure

  • Design and manage containerized workloads using Docker and Kubernetes (EKS on AWS, AKS on Azure), including cluster provisioning, autoscaling, networking policies, and resource quotas.
  • Provision and optimize GPU/compute infrastructure for AI/ML workloads: model training, fine-tuning, inference endpoints, and batch processing pipelines.
  • Implement serverless architectures where appropriate (Lambda, Azure Functions, Step Functions, Durable Functions) to reduce operational overhead and cost.
  • Manage database infrastructure (RDS, Aurora, Azure SQL, CosmosDB, PostgreSQL) with proper replication, failover, backup, and encryption configurations.

CI/CD, Automation & Reliability

  • Build and maintain CI/CD pipelines (GitHub Actions, Azure DevOps, or similar) with infrastructure validation, automated testing, and progressive deployment strategies (blue-green, canary).
  • Implement comprehensive monitoring, alerting, and observability using CloudWatch, Azure Monitor, Datadog, Grafana, or similar tools across all environments.
  • Design disaster recovery and business continuity strategies: multi-region failover, automated backups, RTO/RPO targets, and regular DR testing.
  • Automate operational tasks including patching, certificate rotation, secret management (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault), and compliance scanning.
  • Implement cost management practices: tagging strategies, budget alerts, reserved instance planning, and regular optimization reviews across both cloud providers.

Security & Compliance

  • Apply security best practices to all infrastructure: least-privilege IAM policies, network segmentation, encryption at rest and in transit, and audit logging.
  • Implement compliance controls aligned with SOC 2, with infrastructure configurations that support audit requirements for financial services clients.
  • Manage identity and access across both clouds using AWS IAM, Azure AD (Entra ID), SSO, and role-based access control (RBAC).

Requirements

  • 5+ years of infrastructure or cloud engineering experience, with significant hands-on work in both AWS and Azure environments.
  • Strong Infrastructure as Code expertise: 3+ years of production Terraform experience, including module development, state management, and CI/CD integration for infrastructure.
  • Deep knowledge of AWS services: EC2, EKS, RDS, S3, Lambda, VPC, IAM, CloudFormation, Route 53, and security services (GuardDuty, Security Hub, Config).
  • Deep knowledge of Azure services: AKS, Azure SQL, Azure Functions, VNets, Azure AD/Entra ID, Azure Policy, Key Vault, and Azure Monitor.
  • Strong experience with container orchestration (Kubernetes) including cluster management, Helm charts, service mesh, and observability.
  • Proficiency in scripting and automation with Python, Bash, Go, or PowerShell.
  • Solid understanding of networking fundamentals: DNS, load balancing, firewalls, VPN, peering, and hybrid connectivity patterns.
  • Experience implementing monitoring, alerting, and incident response processes for production systems.
  • Strong communication skills and ability to document infrastructure decisions, runbooks, and architectural diagrams clearly.

Nice to Have

  • Cloud certifications: AWS Solutions Architect Professional, Azure Solutions Architect Expert, HashiCorp Terraform Associate/Engineer, or CKA/CKAD.
  • Experience provisioning and managing GPU infrastructure for AI/ML training and inference workloads (SageMaker, Azure ML, NVIDIA GPU instances).
  • Familiarity with additional IaC tools: Pulumi, CloudFormation, Bicep, or Crossplane.
  • Experience with GitOps workflows using ArgoCD or Flux for Kubernetes deployments.
  • Background in financial services infrastructure, including compliance frameworks (SOC 2, PCI-DSS, GLBA) and data residency requirements.
  • Experience with platform engineering: building internal developer platforms, self-service infrastructure portals, or golden path templates.
  • Prior experience at an early-stage startup where you built cloud infrastructure from scratch.

Tools & Technologies

IaC

Terraform (primary), CloudFormation, Bicep, Pulumi

AWS

EKS, EC2, RDS, S3, Lambda, VPC, IAM, Route 53, SageMaker

Azure

AKS, Azure SQL, Functions, VNets, Entra ID, Key Vault, Azure ML

Containers

Docker, Kubernetes, Helm, ArgoCD

CI/CD

GitHub Actions, Azure DevOps, Atlantis (Terraform automation)

Monitoring

Datadog, Grafana, CloudWatch, Azure Monitor, PagerDuty

Benefits & conditions

Competitive Salary

Competitive base salary reflecting Bay Area market rates and your multi-cloud expertise.

Benefits & Wellness

Full health, dental, and vision insurance. Flexible PTO and a team that respects work-life balance.

Certifications & Growth

Budget for cloud certifications, conferences, and training. Build the infrastructure foundation of an AI company from scratch.

About the company

NovusMinds is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all team members.

Apply for this position