AI Infrastructure Admin
Role details
Job location
Tech stack
Job description
- Design, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and Azure.
- Manage compute resources such as EC2, Azure Virtual Machines, GPU instances, and Kubernetes clusters.
- Provision and configure storage, networking, and security services for AI platforms.
- Ensure high availability, scalability, and reliability of AI environments.
- Deploy and maintain AI/ML services including Amazon SageMaker and Azure Machine Learning.
- Support data scientists and ML engineers with optimized infrastructure for model training and deployment.
- Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, and ARM/Bicep templates.
- Automate provisioning, patching, and scaling of environments.
- Deploy and manage containerized workloads using Docker, Kubernetes, Amazon EKS, and Azure Kubernetes Service (AKS).
- Monitor system performance using CloudWatch, Azure Monitor, Datadog, and Prometheus.
- Optimize infrastructure for cost, performance, and GPU utilization.
- Implement security best practices including IAM/RBAC, encryption, and network security.
- Ensure compliance with organizational and regulatory standards.
- Integrate AI infrastructure with CI/CD pipelines and support automated deployments.
Requirements
- Bachelor's degree in Computer Science, Information Systems, or related field.
- 5+ years of experience in cloud engineering or infrastructure administration.
- Strong hands-on experience with AWS and Microsoft Azure.
- Experience supporting AI/ML infrastructure or data platforms.
- Proficiency in Linux administration and scripting (Python, Bash, PowerShell).
- Hands-on experience with Docker and Kubernetes.
Preferred Qualifications:
- Experience working with GPU-based infrastructure for AI workloads.
- Knowledge of ML pipelines and MLOps practices.
- Experience with data platforms such as Snowflake, Databricks, or Spark.
- Familiarity with AI frameworks like TensorFlow or PyTorch.
- Cloud certifications such as AWS Certified Solutions Architect or Azure AI Engineer.
Key Skills:
- Cloud Infrastructure (AWS, Azure)
- AI/ML Platform Support
- Kubernetes & Containers
- Infrastructure as Code & Automation
- Monitoring & Performance Optimization
- Security & Compliance
Benefits & conditions
Pay Range*: $75/hr. - $80/hr.
*Pay range offered to a successful candidate will be based on several factors, including the candidate's education, work experience, work location, specific job duties, certifications, etc.
Benefits: Innova Solutions offers benefits( based on eligibility) that include the following: Medical & pharmacy coverage, Dental/vision insurance, 401(k), Health saving account (HSA) and Flexible spending account (FSA), Life Insurance, Pet Insurance, Short term and Long term Disability, Accident & Critical illness coverage, Pre-paid legal & ID theft protection, Sick time, and other types of paid leaves (as required by law), Employee Assistance Program (EAP).