Principal AI Platform Engineer

Robson Bale Ltd
A Coruña, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

A Coruña, Spain

Tech stack

Clean Code Principles
API
Artificial Intelligence
Amazon Web Services (AWS)
Data analysis
Big Data
Collaborative Software
Computer Security
Computer Programming
Continuous Integration
DevOps
Distributed Computing Environment
Identity and Access Management
Python
Key Management
Machine Learning
System Software
Software Vulnerability Management
Enterprise Data Management
Istio
Large Language Models
Multi-Agent Systems
Spark
Cloudformation
Containerization
AI Platforms
Information Technology
Machine Learning Operations
Terraform
GXP
Docker
Programming Languages

Job description

pbAI Platform Engineer - Spain - Remote with occasional visits to site - €530pd /b /ppbr/ppContract until the end of the year /pp€530pd /ppRemote with occasional site visits, expenses will be paid for travel /ppbr/ppWe seek a Principal AI Platform Engineer join our Enterprise AI Platforms and Technologies Team. The ideal candidate will have industry-relevant experience delivering at-scale Machine Learning/Data Science in the AWS cloud ecosystem or its competitors. /ppYou will be part of a collaborative team of multidisciplinary engineers and have the chance to create tools that will advance the standard of healthcare, improving the lives of millions of patients across the globe. /ppAs a Principal AI Platform Engineer interested in building complex systems, you will be responsible for inventing how we use technology, machine learning, and data to enable the productivity. You will help design, build, and deploy our next-generation platforms and tools at scale. /ppbKey Accountabilities /b /pulliWork closely with Enterprise architects to define the target architecture and roadmap for the enterprise Data/AI platform covering experimentation, training, feature management, model registry, CI/CD, serving, and observability. Ensure multi-tenant, multi-region, and high-availability designs with clear guardrails. /liliPartner with product management to shape platform vision, backlogs, and OKRs. Establish golden paths, templates, and self-service experiences that reduce friction from ideation to industrialization. /liliOwn capacity planning and cost optimization for GPU/CPU workloads. Drive performance engineering for distributed training and inference and set standards for scalability and efficiency. /liliIntegrate with enterprise data platforms and orchestrators to support scalable pipelines, reproducible experiments, and governed access to datasets. /liliIdentity and secrets management, encryption, and vulnerability management. Partner with Cyber Security and Data Privacy to meet GxP and internal standards without hindering productivity. /liliDrive reusable platform components, common services, and APIs that support multiple business units. /liliTranslate complex platform concepts for senior stakeholders; align solutions to business outcomes in RD, Commercial, and Operations. /li /ulpbr/ppbTechnical Leadership and Expertise /b /pulliStrong analytical and problem-solving skills to address challenges. /liliProven and creative technical leadership skills to drive detailed design and fact-based decision-making. /liliStrong ability to create and communicate designs to engineers that are scalable and efficient AI platforms; implement and maintain the infrastructure and platforms that support the development and deployment of AI solutions. /liliExperience in DevOps/MLOps/AIOps practices to streamline the development and deployment processes. /liliStrong programming skills in Infrastructure as Code (e.g., Terraform, CloudFormation), AWS Services, collaborative software development, programming languages used in AI such as Python, proficiency in containerization technologies like Docker, etc. ; and the ability to write clean, efficient, and maintainable code. /liliFamiliarity with big data technologies, including Apache Spark, for processing and analyzing large datasets. /liliUnderstanding of security standard processes in AI systems and consistency to compliance standards. /liliWillingness to stay updated with the latest advancements in AI technologies through continuous learning and professional development. /liliActively contributes to the continuous improvements/roadmaps of existing AI platforms.

Requirements

li /ulpbr/ppbCandidate Knowledge, Skills, and Experience /b /pulliBE/MS/PhD in Computer Science, Engineering, or a related quantitative field. /liliDemonstrable experience with AWS (or equivalent) across compute, storage, networking, IAM, and cost controls. /liliExperience administering production EKS clusters; strong understanding of operators, storage classes, service mesh, and GPU workloads. /liliProven track record delivering platform software and automation in Python. /liliHands-on experience deploying and operating ML/DS infrastructure using Infrastructure as Code. /liliExperience building model pipelines and lifecycle tooling to accelerate experimentation-to-production. /liliExperience with LLM serving, RAG, vector databases, prompt safety, and token-aware scaling. /liliExperience designing and operating agentic systems, including multi-agent orchestration, tool/action frameworks (e.g., function/tool calling), safety guardrails for autonomous actions, session/state management, and evaluation of agent reliability and cost/performance. /liliExperience with internal security standards; GxP life sciences experience preferred. /liliSoft Skills: Creative, collaborative, resilient, with excellent communication and the ability to influence technical and business stakeholders. /li /ul

Apply for this position