Software Engineer

Randstad
Jersey City, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 123K

Job location

Jersey City, United States of America

Tech stack

Java
Artificial Intelligence
Amazon Web Services (AWS)
Azure
Cloud Computing
Continuous Integration
Linux
DevOps
Distributed Systems
Fault Tolerance
Groovy
Python
Machine Learning
Azure
Software Engineering
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Delivery Pipeline
Reliability of Systems
Cloudformation
Containerization
Git Flow
Information Technology
Machine Learning Operations
Terraform
Data Pipelines
Docker
Jenkins

Job description

Evaluate and adopt emerging technologies to simplify the ML ecosystem and improve developer experience and stability

  • Drive pragmatic innovation by balancing future capabilities with production readiness and maintainability

  • Increase team agility through reusable frameworks, clear abstractions, and automation that reduce downstream friction

  • Resolve technical roadblocks and mitigate platform risks related to scalability, reliability, and integration

  • Accelerate delivery and improve system reliability through robust CI/CD pipelines and automated operational workflows

Requirements

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field
  • Strong Python software engineering experience building and maintaining production-grade libraries, services, and platforms; Linux, scripting, and automation required (Java/Groovy a plus)
  • Experience building and operating cloud-native systems on AWS, including core services and managed ML platforms (e.g., SageMaker); Azure or Google Cloud Platform exposure beneficial
  • Solid DevOps and CI/CD expertise with Jenkins, Git-based workflows, Docker, and scalable containerized deployments
  • Infrastructure-as-Code experience using CloudFormation and/or Terraform/OpenTofu
  • Hands-on experience operating ML systems in production, including deployment, inference, monitoring, and reliability-focused operations
  • Working knowledge of applied ML concepts, data pipelines, and diverse data types to support scalable ML platforms
  • Strong background in distributed systems, high-throughput workloads, asynchronous processing, and fault-tolerant architectures
  • Proven ability to support business-critical systems, troubleshoot production issues, and drive stability and performance improvements
  • 5+ years building Python-based cloud applications or platforms with ownership of complex production systems
  • Excellent communication and collaboration skills, including technical documentation and cross-functional teamwork
  • Ability to thrive in fast-paced, ambiguous environments within a broader AI and data ecosystem

qualifications:

  • Partner with Data Scientists to package, scale, and operationalize models for secure, reliable production use

  • Collaborate with application and platform engineers to integrate ML capabilities with enterprise gateways and services

  • Design and operate enterprise-scale ML systems serving tens of millions of users with high reliability and performance

  • Build platform tooling for model and data observability, including drift detection, quality monitoring, and automated diagnostics

Apply for this position