Senior AI Platform Engineer (Domino)

EPAM Systems, Inc.
Barcelona, Spain
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote
Barcelona, Spain

Tech stack

API
Artificial Intelligence
Amazon Web Services (AWS)
Computer Programming
Continuous Integration
DevOps
Distributed Computing Environment
Identity and Access Management
Performance Tuning
Cloud Services
Azure
Software Safety
Session Management
Software Deployment
Systems Integration
Graphics Processing Unit (GPU)
Istio
Large Language Models
Multi-Agent Systems
Spark
Cloudformation
AI Platforms
Information Technology
Data Management
Machine Learning Operations
Terraform
GXP

Job description

We're looking for a Senior AI Platform Engineer (Domino) to join our team in a remote working mode with occasional onsite visits to Barcelona, Spain. In this role, you will design, build and optimize next-generation AI/ML platforms that enable enterprise-scale experimentation, model lifecycle management and production deployment in a secure, high-availability environment. You will work within the AWS cloud ecosystem, leveraging Domino Data Lab as a core platform component while integrating with enterprise data solutions and MLOps best practices.

This role combines technical expertise and architectural insight, giving you the opportunity to influence platform strategy while delivering automation, scalability and compliance to accelerate data science and AI initiatives across R&D, commercial functions and operations.

Responsibilities

  • Define and implement enterprise AI platform architecture, including experimentation, training, model registry, CI/CD and observability components
  • Build and maintain reusable services, APIs and automation for scalable platform adoption
  • Administer and optimize Domino Data Lab for multi-tenant and multi-region usage
  • Lead integration of the AI platform with enterprise data pipelines, orchestrators and security frameworks
  • Drive cost optimization, performance tuning and GPU/CPU resource planning for distributed training and inference
  • Support the development of model pipelines and tooling that streamline experimentation-to-production workflows
  • Apply DevOps/MLOps practices using Infrastructure as Code for automation and compliance
  • Ensure robust security, identity management, encryption and regulatory compliance in collaboration with cybersecurity and data privacy teams
  • Research and drive new capabilities in LLM operations, including RAG patterns, vector databases and safety mechanisms
  • Foster technical best practices and mentor engineering teams to improve platform maturity

Requirements

Do you have experience in Terraform?, Do you have a Master's degree?, * Proven hands-on experience with Domino Data Lab administration and customization

  • Strong background in AWS or equivalent cloud ecosystem (compute, storage, networking, IAM, governance)
  • Experience deploying and managing EKS clusters, including networking, storage classes, operators, GPU workloads and service mesh
  • Advanced Python programming skills, including automation and platform tooling development
  • Proficiency with Infrastructure as Code (e.g., Terraform, CloudFormation)
  • Experience implementing MLOps frameworks for model lifecycle management and reproducibility
  • Familiarity with distributed processing and big data tools (e.g., Apache Spark)
  • Understanding of security best practices and compliance standards in regulated environments
  • Background in LLM operations and multi-agent orchestration preferred
  • Excellent communication skills and ability to translate technical concepts for diverse audiences
  • Degree in Computer Science, Engineering, or a related field

Nice to have

  • Exposure to GxP life sciences environments and governance processes
  • Knowledge of AI safety, token-aware scaling and session management
  • Familiarity with cost/performance optimization strategies for AI workloads
  • Contributions to internal platform strategy and improvement roadmaps

Benefits & conditions

Pulled from the full job description

  • Referral program
  • Paid time off, * Private health insurance
  • EPAM Employees Stock Purchase Plan
  • 100% paid sick leave
  • Referral Program
  • Professional certification
  • Language courses, * WORK AND LIFE BALANCE. Enjoy more of your personal time with flexible work options, 24 working days of annual leave and paid time off for numerous public holidays.
  • CONTINUOUS LEARNING CULTURE. Craft your personal Career Development Plan to align with your learning objectives. Take advantage of internal training, mentorship, sponsored certifications and LinkedIn courses.
  • CLEAR AND DIFFERENT CAREER PATHS. Grow in engineering or managerial direction to become a People Manager, in-depth technical specialist, Solution Architect, or Project/Delivery Manager.
  • STRONG PROFESSIONAL COMMUNITY. Join a global EPAM community of highly skilled experts and connect with them to solve challenges, exchange ideas, share expertise and make friends.

About the company

EPAM is a leading digital transformation services and product engineering company with 61,700+ EPAMers in 55+ countries and regions. Since 1993, our multidisciplinary teams have been helping make the future real for our clients and communities around the world. In 2018, we opened an office in Spain that quickly grew to over 1,450 EPAMers distributed between the offices in Málaga, Madrid and Cáceres as well as remotely across the country. Here you will collaborate with multinational teams, contribute to numerous innovative projects, and have an opportunity to learn and grow continuously.

Apply for this position