Sr. AI Infrastructure Engineer

Lenovo

Renfrew, United Kingdom

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Renfrew, United Kingdom

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Architectural Patterns

Azure

Bash

Cloud Computing

Cloud Foundry

Continuous Integration

Information Engineering

Linux

DevOps

Distributed Systems

Python

Key Management

Lightweight Directory Access Protocols (LDAP)

Meta-Data Management

Openshift

Role-Based Access Control

Azure

Ansible

Prometheus

Azure

Workflow Management Systems

AI Infrastructure

Google Cloud Platform

Large Language Models

Grafana

Multi-Cloud

HybridCloud

AI Platforms

Kubernetes

Information Technology

Machine Learning Operations

Terraform

Oracle Cloud Infrastructure

Job description

Lenovo is seeking a senior technical leader to guide the strategy, architecture, and delivery of our next generation Hybrid AI Platform. In this role, you will provide leadership across AI infrastructure, MLOps, cloud native platform engineering, and operational excellence -setting direction for teams that build and run production grade AI/ML platforms on Kubernetes. You will drive the vision for scalable, secure, and reliable AI systems while partnering closely with engineering, product, and executive stakeholders. If you are passionate about leading high impact AI platform initiatives, mentoring engineering talent, and shaping enterprise wide Hybrid AI capabilities, we invite you to join us., AI Platform Engineering & Operations

Provide technical leadership and architectural direction for Kubernetes/OpenShift based AI/ML platform design, scalability strategy, security posture, and operational standards.
Oversee platform roadmap, ensuring alignment with Lenovo's broader Hybrid AI strategy and enterprise architecture principles.
Lead engineering teams in implementing GitOps driven, cloud native platform automation using ArgoCD and Helm.
Set standards for Linux systems management, platform hardening, and operational reliability across all AI infrastructure.

MLOps & Model Lifecycle Management

Define and evolve the enterprise MLOps architecture, enabling reproducible, automated, and governed AI model workflows.
Lead teams in building and optimizing ML pipelines using Kubeflow Pipelines, Tekton, and Python SDKs.
Architect scalable, production ready model serving solutions using KServe, Knative, and Triton (where applicable).
Champion consistency in model registry usage, metadata management, workflow orchestration, and ML lifecycle governance.

Automation, Observability & Reliability

Develop the long term automation and platform SRE strategy, including Python/Ansible based automation and Terraform driven IaC patterns.
Establish observability standards for AI/ML systems using Prometheus, Grafana, AlertManager, and related tooling.
Oversee capacity planning, performance engineering, incident response processes, and continuous reliability improvements.
Drive adoption of automation first principles to reduce operational overhead and improve engineering velocity.

Cloud & Infrastructure Integration

Own the multi cloud and hybrid cloud integration strategy across AWS, GCP, Azure, and on premises environments.
Direct the design of enterprise grade identity and security integrations (Azure AD, LDAP, RBAC, secrets management).
Partner with cloud, security, and networking leadership to ensure the AI platform meets enterprise compliance and governance requirements.

Collaboration & Customer Success

Act as a senior point of technical escalation for internal teams and critical customer deployments.
Influence cross functional strategy across AI engineering, DevOps, data science, and product teams.
Mentor staff engineers and up level the team's capabilities through architectural reviews, technical coaching, and leadership by example.
Represent the platform's strategy and progress to leadership stakeholders, ensuring alignment with business goals and customer needs.

Requirements

Do you have a valid Dangerous Goods Driver's License license?, Do you have experience in Terraform?, Do you have a Master's degree?, * Bachelor's degree in Computer Science, Engineering, or related field (Master's preferred).

10+ years of experience in DevOps, cloud native platform engineering, or AI/ML platform operations, including leadership or architectural responsibility.
Proven expertise in Kubernetes/OpenShift platform leadership, including cluster lifecycle management, operator design, advanced networking, and platform level security.
In depth experience with GitOps at scale using ArgoCD, Helm, and automated cluster configuration patterns.
Advanced knowledge of MLOps tooling (e.g., KServe, Kubeflow, Tekton, Knative) and ML workflow automation.
Strong proficiency in Python, Bash, and automation frameworks like Ansible and Terraform.
Deep experience with AWS, GCP, Azure, and hybrid cloud architectural patterns.
Strong observability leadership experience with Prometheus, Grafana, and distributed system monitoring.
Exceptional communication, stakeholder management, and cross functional leadership skills.
Proven track record shaping technical strategy, influencing engineering culture, and delivering complex, large scale platforms.

Bonus Points

Experience leading initiatives within the Red Hat OpenShift AI ecosystem.
Knowledge of enterprise scale LLM and model serving architectures (e.g., Triton ensemble models, OCI artifact based LLM deployments).
Advanced industry certifications such as CKA, CKS, GCP ACE, AWS SAA/SA Pro, or Red Hat OpenShift specializations.
Experience guiding data engineering or AI/ML workflow orchestration teams.
Demonstrated leadership in monorepo based CI/CD modernization initiatives.
Experience implementing and governing Internal Developer Portals (e.g., Backstage) across large engineering organizations.

Benefits & conditions

What we offer:

Opportunities for career advancement and personal development
Access to a diverse range of training programs
Performance-based rewards that celebrate your achievements
Flexibility with a hybrid work model (3:2) that blends home and office life
Electric car salary sacrifice scheme
Life insurance

About the company

Why Work at Lenovo We are Lenovo. We do what we say. We own what we do. We WOW our customers. Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY). This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub., The Lenovo AI Technology Center (LATC)-Lenovo's global AI Center of Excellence-is driving our transformation into an AI-first organization. We are assembling a world-class team of researchers, engineers, and innovators to position Lenovo and its customers at the forefront of the generational shift toward AI. Lenovo is one of the world's leading computing companies, delivering products across the entire technology spectrum, spanning wearables, smartphones (Motorola), laptops (ThinkPad, Yoga), PCs, workstations, servers, and services/solutions. This unmatched breadth gives us a unique canvas for AI innovation, including the ability to rapidly deploy cutting-edge foundation models and to enable flexible, hybrid-cloud, and agentic computing across our full product portfolio. To this end, we are building the next wave of AI core technologies and platforms that leverage and evolve with the fast-moving AI ecosystem, including novel model and agentic orchestration & collaboration across mobile, edge, and cloud resources. This space is evolving fast and so are we. If you're ready to shape AI at a truly global scale, with products that touch every corner of life and work, there's no better time to join us.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all