DevOps Engineer - Senior Vice President

ICAPITAL LLC
New York, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior
Compensation
$ 230K

Job location

Remote
New York, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Cloud Computing
Continuous Integration
Data Stores
Linux
DevOps
Amazon DynamoDB
Python
PostgreSQL
Machine Learning
Cloud Services
Azure
AI Infrastructure
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
Large Language Models
Generative AI
Containerization
AI Platforms
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Machine Learning Operations
Terraform

Job description

The Platform Infrastructure team at iCapital plays a critical role in ensuring that both production and development environments operate smoothly, securely, and reliably. This role leverages advanced cloud capabilities to support the Platform Infrastructure strategy of market agility and lean operating principles, with a strong emphasis on quality to meet the ever-growing demands of our clients.

We are seeking highly collaborative, creative, and intellectually curious MLOps/DevOps Engineers with deep expertise in machine learning operations, cloud infrastructure, CI/CD automation, Kubernetes, and security. This role requires hands-on experience designing, building, and operating scalable DevOps and enterprise-grade MLOps platforms, including model lifecycle automation, observability, and governance.

As a Platform Engineer, you will wear multiple hats in a highly visible role, partnering closely with engineering, security, data, and business teams to deliver secure, reliable, and highly automated platforms that support both application and machine-learning workloads.

Responsibilities

  • Design, build, and operate MLOps pipelines supporting the full ML lifecycle (training, validation, deployment, monitoring).
  • Enable production workloads for AI/ML and Generative AI systems, including LLM-based services.
  • Develop and maintain CI/CD pipelines for AI/ML services and supporting infrastructure.
  • Build and manage cloud-native infrastructure on AWS, with heavy use of Kubernetes and containerized workloads.
  • Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform).
  • Implement model versioning, experiment tracking, and artifact management across environments.
  • Ensure reliability, scalability, observability, and cost efficiency of AI platforms.
  • Partner with AI/ML engineers to operationalize models and standardize deployment patterns.
  • Implement monitoring and alerting for system health, model performance, and drift.
  • Enforce security, compliance, and governance requirements for AI workloads.
  • Participate in incident response, root cause analysis, and continuous improvement initiatives.
  • Document standards, best practices, and reference architectures for MLOps and AI infrastructure.

Requirements

Do you have experience in Terraform?, * 15+ years of experience in DevOps, SRE, or Platform Engineering, with AWS as a primary cloud.

  • Experience supporting machine learning systems in production, including deployment and monitoring concerns.
  • Hands-on experience with machine learning platforms, particularly AWS SageMaker (required).
  • Strong hands-on experience with Kubernetes, containerized workloads, and cloud networking.
  • Proven experience building and operating CI/CD pipelines (e.g., GitLab CI, ArgoCD).
  • Strong proficiency with Terraform and scripting/programming in Python or similar languages.
  • Solid Linux, systems, and troubleshooting fundamentals.
  • Excellent communication skills and ability to work across teams.
  • Direct experience with MLOps platforms and tooling (model registries, experiment tracking, feature stores).
  • Exposure to Generative AI / LLM workloads in production environments.
  • Familiarity with data stores commonly used in ML systems (e.g., Postgres, DynamoDB, object storage).
  • Experience operating in regulated or fintech environments.
  • Background in cost optimization for compute-intensive workloads.
  • Strong written and verbal communication skills.
  • AWS certifications are a plus.

Benefits & conditions

3.33.3 out of 5 stars 60 E 42nd St 26th FL, New York, NY 10165 $180,000 - $230,000 a year - Full-time, Pulled from the full job description

  • Parental leave
  • Retirement plan
  • Paid time off
  • Vision insurance
  • Dental insurance
  • Unlimited paid time off, The base salary range for this role is $180,000 to $230,000 depending on experience. iCapital offers a compensation package which includes salary, equity for all full-time employees, and an annual performance bonus. Employees also receive a comprehensive benefits package that includes an employer matched retirement plan, generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling, parental leave, and unlimited paid time off (PTO).

We believe the best ideas and innovation happen when we are together. Employees in this role will work in the office Monday-Thursday, with the flexibility to work remotely on Friday.

Apply for this position