Software Engineer

Randstad

Jersey City, United States of America

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 123K

Job location

Jersey City, United States of America

Tech stack

Java

Artificial Intelligence

Amazon Web Services (AWS)

Azure

Cloud Computing

Continuous Integration

Linux

DevOps

Distributed Systems

Fault Tolerance

Groovy

Python

Machine Learning

Azure

Software Engineering

Scripting (Bash/Python/Go/Ruby)

Google Cloud Platform

Delivery Pipeline

Reliability of Systems

Cloudformation

Containerization

Git Flow

Information Technology

Machine Learning Operations

Terraform

Data Pipelines

Docker

Jenkins

Job description

Evaluate and adopt emerging technologies to simplify the ML ecosystem and improve developer experience and stability

Drive pragmatic innovation by balancing future capabilities with production readiness and maintainability
Increase team agility through reusable frameworks, clear abstractions, and automation that reduce downstream friction
Resolve technical roadblocks and mitigate platform risks related to scalability, reliability, and integration
Accelerate delivery and improve system reliability through robust CI/CD pipelines and automated operational workflows

Requirements

Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field
Strong Python software engineering experience building and maintaining production-grade libraries, services, and platforms; Linux, scripting, and automation required (Java/Groovy a plus)
Experience building and operating cloud-native systems on AWS, including core services and managed ML platforms (e.g., SageMaker); Azure or Google Cloud Platform exposure beneficial
Solid DevOps and CI/CD expertise with Jenkins, Git-based workflows, Docker, and scalable containerized deployments
Infrastructure-as-Code experience using CloudFormation and/or Terraform/OpenTofu
Hands-on experience operating ML systems in production, including deployment, inference, monitoring, and reliability-focused operations
Working knowledge of applied ML concepts, data pipelines, and diverse data types to support scalable ML platforms
Strong background in distributed systems, high-throughput workloads, asynchronous processing, and fault-tolerant architectures
Proven ability to support business-critical systems, troubleshoot production issues, and drive stability and performance improvements
5+ years building Python-based cloud applications or platforms with ownership of complex production systems
Excellent communication and collaboration skills, including technical documentation and cross-functional teamwork
Ability to thrive in fast-paced, ambiguous environments within a broader AI and data ecosystem

qualifications:

Partner with Data Scientists to package, scale, and operationalize models for secure, reliable production use
Collaborate with application and platform engineers to integrate ML capabilities with enterprise gateways and services
Design and operate enterprise-scale ML systems serving tens of millions of users with high reliability and performance
Build platform tooling for model and data observability, including drift detection, quality monitoring, and automated diagnostics