AI Platform Engineer (Hybrid in NYC or CT)

Insight Global

Stamford, United States of America

2 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Stamford, United States of America

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Bash

Cloud Computing

Computer Programming

Continuous Integration

DevOps

Distributed Systems

Identity and Access Management

Python

Reliability Engineering

Prometheus

Software Engineering

Datadog

Data Logging

Scripting (Bash/Python/Go/Ruby)

Grafana

AWS Lambda

Amazon Web Services (AWS)

Cloudformation

Amazon Web Services (AWS)

AI Platforms

Kubernetes

Infrastructure Automation Frameworks

Deployment Automation

Cloudwatch

Terraform

Docker

Job description

We are seeking a Platform Engineer to help design, build, and operate the foundational cloud and application platforms that power our AI digital products and services. In this role, you will focus on creating reliable, secure, scalable and quality assured platforms that enable application teams to deliver software quickly and safely.

You will work closely with infrastructure, security, and application teams to provide self-service capabilities, standardized tooling, ensure quality and strong operational practices across environments.

Responsibilities

Platform & Cloud Infrastructure

Build and operate cloud-based platform services that support application development and runtime workloads.
Design and maintain infrastructure using AWS services such as EC2, EKS, ECS, S3, RDS, IAM, VPC, Lambdas, Bedrock AI services and CloudWatch.
Implement and manage Infrastructure as Code (IaC) using Terraform, CDK, CloudFormation, or similar tools.
Support containerized and non-containerized workloads across development, staging, and production environments.

Reliability, Operations & Observability

Ensure platform reliability, availability, and performance using DevOps and SRE best practices.
Implement and maintain monitoring, logging, and alerting for platform services.
Participate in on-call rotations and incident response, contributing to root cause analysis and continuous improvement.
Develop operational runbooks and automation to reduce manual workload.

Quality Assurance, Security & Governance

Build platforms that are secure by default, following least-privilege access and defense-in-depth principles.
Partner with security and compliance teams to implement required controls, policies, and auditability.

Requirements

3-6+ years of experience in platform engineering, DevOps, SRE, or infrastructure engineering roles.
Hands-on experience with AWS in production environments.
Experience with Infrastructure as Code tools (Terraform preferred).
Familiarity with containers and orchestration (Docker, Kubernetes, or ECS).
Understanding of monitoring, logging, and alerting concepts.
Experience with scripting or programming (Python, Bash, or similar).
Solid understanding of networking, security, and distributed systems fundamentals. Preferred Qualifications
Experience operating Kubernetes platforms (EKS).
Familiarity with CI/CD systems and deployment automation.
Exposure to observability tools such as CloudWatch, Prometheus, Grafana, Datadog, or similar.
Experience working in large-scale or enterprise environments.
Interest in improving developer productivity and platform usability.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all