AI Platform Engineer (Hybrid in NYC or CT)

Insight Global
Stamford, United States of America
19 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Stamford, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Bash
Cloud Computing
Computer Programming
Continuous Integration
DevOps
Distributed Systems
Identity and Access Management
Python
Reliability Engineering
Prometheus
Software Engineering
Datadog
Data Logging
Scripting (Bash/Python/Go/Ruby)
Grafana
AWS Lambda
Amazon Web Services (AWS)
Cloudformation
Amazon Web Services (AWS)
AI Platforms
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Cloudwatch
Terraform
Docker

Job description

We are seeking a Platform Engineer to help design, build, and operate the foundational cloud and application platforms that power our AI digital products and services. In this role, you will focus on creating reliable, secure, scalable and quality assured platforms that enable application teams to deliver software quickly and safely.

You will work closely with infrastructure, security, and application teams to provide self-service capabilities, standardized tooling, ensure quality and strong operational practices across environments.

Responsibilities

Platform & Cloud Infrastructure

  • Build and operate cloud-based platform services that support application development and runtime workloads.

  • Design and maintain infrastructure using AWS services such as EC2, EKS, ECS, S3, RDS, IAM, VPC, Lambdas, Bedrock AI services and CloudWatch.

  • Implement and manage Infrastructure as Code (IaC) using Terraform, CDK, CloudFormation, or similar tools.

  • Support containerized and non-containerized workloads across development, staging, and production environments.

Reliability, Operations & Observability

  • Ensure platform reliability, availability, and performance using DevOps and SRE best practices.

  • Implement and maintain monitoring, logging, and alerting for platform services.

  • Participate in on-call rotations and incident response, contributing to root cause analysis and continuous improvement.

  • Develop operational runbooks and automation to reduce manual workload.

Quality Assurance, Security & Governance

  • Build platforms that are secure by default, following least-privilege access and defense-in-depth principles.

  • Partner with security and compliance teams to implement required controls, policies, and auditability.

Requirements

  • 3-6+ years of experience in platform engineering, DevOps, SRE, or infrastructure engineering roles.

  • Hands-on experience with AWS in production environments.

  • Experience with Infrastructure as Code tools (Terraform preferred).

  • Familiarity with containers and orchestration (Docker, Kubernetes, or ECS).

  • Understanding of monitoring, logging, and alerting concepts.

  • Experience with scripting or programming (Python, Bash, or similar).

  • Solid understanding of networking, security, and distributed systems fundamentals. Preferred Qualifications

  • Experience operating Kubernetes platforms (EKS).

  • Familiarity with CI/CD systems and deployment automation.

  • Exposure to observability tools such as CloudWatch, Prometheus, Grafana, Datadog, or similar.

  • Experience working in large-scale or enterprise environments.

  • Interest in improving developer productivity and platform usability.

Apply for this position