Azure Cloud Engineer (AKS Specialist)

Aptlogix LLC
Dallas, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Dallas, United States of America

Tech stack

Artificial Intelligence
Azure
Bash
Cloud Computing
Continuous Integration
DNS
Github
Python
Key Management
Log Analysis
Nginx
Role-Based Access Control
Prometheus
Azure
Scripting (Bash/Python/Go/Ruby)
Load Balancing
Autoscaling
Istio
Grafana
Infrastructure as Code (IaC)
AI Platforms
Kubernetes
Information Technology
Bicep
Patch Management
Linkerd (Service Mesh)
Azure
Machine Learning Operations
Terraform

Job description

  • AKS Lifecycle Management: Lead the end-to-end deployment, configuration, and management of production-grade AKS clusters.
  • Networking & Ingress: Design and maintain robust Azure networking architectures, including VNet integration, Azure CNI, and sophisticated Ingress controllers (Nginx, AGIC).
  • Scaling & Optimization: Implement and manage auto-scaling strategies (HPA, VPA, and Cluster Autoscaler) to handle fluctuating workloads efficiently.
  • AI/ML Support: Provide platform-level support for AI/ML workloads, ensuring GPU node pools and specialized compute resources are optimized for model training and inference.
  • Operational Excellence: Drive "Day 2" operations, including patch management, cluster upgrades, observability (Azure Monitor/Log Analytics), and proactive troubleshooting.
  • Infrastructure as Code (IaC): Automate all infrastructure provisioning using Terraform or Bicep to ensure repeatable and consistent environments.
  • Security & Governance: Enforce Azure Policy, RBAC, and Secret Management (Azure Key Vault) to maintain a secure and compliant container platform.

Requirements

We are seeking a highly skilled Senior Azure Cloud Engineer with deep, hands-on expertise in Azure Kubernetes Service (AKS). This role is designed for a platform-focused engineer who takes true ownership of the container ecosystem from initial cluster deployment and complex networking to long-term operational excellence.

You will be a key player in managing our cloud infrastructure, with a specific focus on supporting high-demand AI/ML workloads running within Kubernetes. The ideal candidate is a Dallas-based professional who thrives on optimizing performance, ensuring seamless scaling, and maintaining the "day-to-day" health of a mission-critical Azure environment., * Azure Mastery: Extensive experience with the Azure ecosystem (Compute, Storage, Networking, Identity).

  • AKS Specialist: Deep technical knowledge of Kubernetes internals, including scheduling, storage classes, and service mesh (Istio/Linkerd).
  • Advanced Networking: Hands-on experience with Azure Load Balancers, Application Gateways, Private Links, and DNS resolution within AKS.
  • Automation: Strong proficiency in Terraform and CI/CD workflows (GitHub Actions or Azure DevOps).
  • Scripting: Fluent in Python or Bash for operational automation and custom tooling.
  • Education: Bachelor s Degree in Computer Science, Engineering, or a related technology field., * AI/ML Exposure: Familiarity with running MLOps frameworks or AI platforms (like Azure Machine Learning) on Kubernetes.
  • Certifications: Microsoft Certified: Azure Solutions Architect Expert or CKA (Certified Kubernetes Administrator).
  • Monitoring: Experience with Prometheus, Grafana, or Azure Managed Grafana.

Apply for this position