Infrastructure Admin - Remote / Telecommute

CYNET SYSTEMS INC.
Los Angeles, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
Los Angeles, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Azure
Bash
Cloud Computing
Cloud Computing Security
Cloud Engineering
Information Systems
System Configuration
Continuous Integration
Github
Identity and Access Management
Python
Key Management
Network Security
Linux System Administration
Performance Tuning
Powershell
Role-Based Access Control
Prometheus
Azure
Virtual Machines
AI Infrastructure
Datadog
Scripting (Bash/Python/Go/Ruby)
Cloud Monitoring
System Availability
Large Language Models
Grafana
Cloudformation
Containerization
AI Platforms
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Deployment Automation
Bicep
Data Management
Machine Learning Operations
Cloudwatch
Terraform
Azure
Docker

Job description

  • Design, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and Azure.
  • Manage compute resources including EC2, Azure Virtual Machines, GPU instances, EKS, ECS, and Kubernetes clusters.
  • Provision and configure storage, networking, and security services for AI platforms.
  • Ensure high availability, scalability, and reliability of AI infrastructure environments.
  • Deploy and maintain AI/ML platforms such as Amazon SageMaker, Azure Machine Learning, and related AI services.
  • Support data scientists and ML engineers with optimized infrastructure for model training and deployment.
  • Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, ARM templates, or Bicep.
  • Automate provisioning, patching, scaling, and environment setup processes.
  • Deploy and manage containerized workloads using Docker, Kubernetes, EKS, AKS, and ECS.
  • Monitor system health, performance, and resource utilization using tools like CloudWatch, Azure Monitor, Datadog, or Prometheus.
  • Optimize infrastructure for cost efficiency, performance, and GPU utilization.
  • Implement cloud security best practices including IAM/RBAC, network security, encryption, and secrets management.
  • Ensure compliance with organizational and regulatory standards.
  • Integrate infrastructure with CI/CD pipelines and support automated deployment of AI services.

Requirements

  • Bachelor s degree in Computer Science, Information Systems, or related field.
  • 5+ years of experience in infrastructure administration or cloud engineering.
  • Strong hands-on experience with AWS and Microsoft Azure cloud platforms.
  • Experience supporting AI/ML infrastructure or data platforms.
  • Proficiency in Linux administration and scripting (Python, Bash, PowerShell).
  • Experience with Infrastructure as Code tools such as Terraform or similar.
  • Hands-on experience with Docker and Kubernetes.
  • Experience with CI/CD tools such as GitHub Actions.
  • Knowledge of LLM infrastructure setup and support.
  • Experience working in centralized support environments with triaging capabilities.

Preferred Qualifications:

  • Experience with GPU-based workloads and performance optimization.
  • Familiarity with advanced monitoring and observability tools.
  • Exposure to enterprise-level cloud security and compliance frameworks.

Soft Skills:

  • Strong analytical and problem-solving skills.
  • Excellent communication and collaboration abilities.
  • Ability to work independently and manage multiple priorities.
  • Detail-oriented with a focus on reliability and performance.
  • Proactive mindset with continuous improvement focus.

Apply for this position