Principal AI Platform Engineer

Robson Bale Ltd
Municipality of Valencia, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Municipality of Valencia, Spain

Tech stack

API
Artificial Intelligence
Amazon Web Services (AWS)
Big Data
Collaborative Software
Computer Security
Computer Programming
Continuous Integration
DevOps
Distributed Computing Environment
Identity and Access Management
Python
Key Management
Machine Learning
System Software
Software Vulnerability Management
Enterprise Data Management
Istio
Large Language Models
Multi-Agent Systems
Spark
Cloudformation
Containerization
AI Platforms
Information Technology
Machine Learning Operations
Terraform
GXP
Docker
Programming Languages

Job description

AI Platform Engineer - Spain - Remote with occasional visits to site - €530pdContract until the end of the year€530pdRemote with occasional site visits, expenses will be paid for travelWe seek a Principal AI Platform Engineer join our Enterprise AI Platforms and Technologies Team. The ideal candidate will have industry-relevant experience delivering at-scale Machine Learning/Data Science in the AWS cloud ecosystem or its competitors.You will be part of a collaborative team of multidisciplinary engineers and have the chance to create tools that will advance the standard of healthcare, improving the lives of millions of patients across the globe.As a Principal AI Platform Engineer interested in building complex systems, you will be responsible for inventing how we use technology, machine learning, and data to enable the productivity. You will help design, build, and deploy our next-generation platforms and tools at scale.Key AccountabilitiesWork closely with Enterprise architects to define the target architecture and roadmap for the enterprise Data/AI platform covering experimentation, training, feature management, model registry, CI/CD, serving, and observability. Ensure multi-tenant, multi-region, and high-availability designs with clear guardrails.Partner with product management to shape platform vision, backlogs, and OKRs. Establish golden paths, templates, and self-service experiences that reduce friction from ideation to industrialization.Own capacity planning and cost optimization for GPU/CPU workloads. Drive performance engineering for distributed training and inference and set standards for scalability and efficiency.Integrate with enterprise data platforms and orchestrators to support scalable pipelines, reproducible experiments, and governed access to datasets.Identity and secrets management, encryption, and vulnerability management.

Requirements

Partner with Cyber Security and Data Privacy to meet GxP and internal standards without hindering productivity.Drive reusable platform components, common services, and APIs that support multiple business units.Translate complex platform concepts for senior stakeholders; align solutions to business outcomes in R&D, Commercial, and Operations.Technical Leadership and ExpertiseStrong analytical and problem-solving skills to address challenges.Proven and creative technical leadership skills to drive detailed design and fact-based decision-making.Strong ability to create and communicate designs to engineers that are scalable and efficient AI platforms; implement and maintain the infrastructure and platforms that support the development and deployment of AI solutions.Experience in DevOps/MLOps/AIOps practices to streamline the development and deployment processes.Strong programming skills in Infrastructure as Code (e.g., Terraform, CloudFormation), AWS Services, collaborative software development, programming languages used in AI such as Python, proficiency in containerization technologies like Docker, etc. ; and the ability to write clean, efficient, and maintainable code.Familiarity with big data technologies, including Apache Spark, for processing and analyzing large datasets.Understanding of security standard processes in AI systems and consistency to compliance standards.Willingness to stay updated with the latest advancements in AI technologies through continuous learning and professional development.Actively contributes to the continuous improvements/roadmaps of existing AI platforms.Candidate Knowledge, Skills, and ExperienceBE/MS/PhD in Computer Science, Engineering, or a related quantitative field.Demonstrable experience with AWS (or equivalent) across compute, storage, networking, IAM, and cost controls.Experience administering production EKS clusters; strong understanding of operators, storage classes, service mesh, and GPU workloads.Proven track record delivering platform software and automation in Python.Hands-on experience deploying and operating ML/DS infrastructure using Infrastructure as Code.Experience building model pipelines and lifecycle tooling to accelerate experimentation-to-production.Experience with LLM serving, RAG, vector databases, prompt safety, and token-aware scaling.Experience designing and operating agentic systems, including multi-agent orchestration, tool/action frameworks (e.g., function/tool calling), safety guardrails for autonomous actions, session/state management, and evaluation of agent reliability and cost/performance.Experience with internal security standards; GxP life sciences experience preferred.Soft Skills: Creative, collaborative, resilient, with excellent communication and the ability to influence technical and business stakeholders. #J-*****-Ljbffr

Apply for this position