MLOps Engineer - Machine Learning Platform

The Goldman Sachs Group Inc

Jersey City, United States of America

27 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Junior

Job location

Jersey City, United States of America

Tech stack

Clean Code Principles

API

Amazon Web Services (AWS)

Unix

Databases

Software Debugging

Distributed Systems

Identity and Access Management

Python

Machine Learning

NoSQL

TensorFlow

Software Engineering

SQL Databases

Data Logging

PyTorch

Large Language Models

Backend

Cloudformation

Kubernetes

Infrastructure Automation Frameworks

Machine Learning Operations

TensorRT

Terraform

Docker

Requirements

2 years of experience in software engineering (backend, platform, or infrastructure).

2 years of experience in Python or a similar backend programming language.

1 year of experience supporting production ML systems (MLOps, platform or inference-related work)

Basic understanding of APIs (REST or similar) and service-to-service communication.

Experience working with containers (e.g., Docker).

Familiarity with Unix-based systems.

Exposure to public cloud environments (e.g., AWS or GCP), including core concepts such as compute, storage, and basic IAM.

Experience working with databases (SQL or NoSQL).

Solid grasp of software engineering fundamentals, including debugging, testing, and maintainable code design.

Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.

Curiosity and a strong desire to keep learning-especially in the model inference and LLM platform space.

Preferred Qualifications: * 4 years of experience in software engineering (backend, platform, or infrastructure)

4 years of experience supporting production ML systems (MLOps, platform or inference-related work)

4 years of experience in Python or a similar backend programming language.

Strong understanding of the end-to-end Model Development Lifecycle (MDLC)

Basic understanding of distributed systems concepts and exposure to observability concepts (logging, metrics, tracing).

Experience building containerized runtime environments for model serving (e.g. vLLM, SGLang, TensorRT, Triton, AWS Multi Model Server)

Experience with infrastructure-as-code tools, such as Terraform or CloudFormation

Experience with Kubernetes and other container orchestration platforms in the public cloud (e.g. AWS, GCP)

Experience building Machine Learning models with frameworks such as PyTorch and TensorFlow

Excellent communication skills and the ability to articulate complex technical concepts to both technical and non-technical stakeholders.

About the company

At Goldman Sachs, our Engineers don't just make things - we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action. Create new businesses, transform finance, and explore a world of opportunity at the speed of markets. Engineering, which is comprised of our Technology Division and global strategists' groups, is at the critical center of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions. Want to push the limit of digital possibilities? Start here. Who We Look For We are seeking a skilled and motivated engineer to join our Artificial Intelligence Platforms organization as an MLOps Engineer on our Machine Learning Services team. You will be part of an expert team building and operating production-grade platform and backend systems leveraged by ML engineers and application teams across the entire firm. A key focus of this role is enabling reliable, scalable, and observable deployment of Machine Learning and Large Language Models (LLMs). This role is best suited for engineers who enjoy working on infrastructure, backend services, and distributed systems, rather than primarily on model experimentation and development.

Role details

Job location

Tech stack

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all