Sr. AI Infrastructure Engineer

Lenovo
Renfrew, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Renfrew, United Kingdom

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Architectural Patterns
Azure
Bash
Cloud Computing
Cloud Foundry
Continuous Integration
Information Engineering
Linux
DevOps
Distributed Systems
Python
Key Management
Lightweight Directory Access Protocols (LDAP)
Meta-Data Management
Openshift
Role-Based Access Control
Azure
Ansible
Prometheus
Azure
Workflow Management Systems
AI Infrastructure
Google Cloud Platform
Large Language Models
Grafana
Multi-Cloud
HybridCloud
AI Platforms
Kubernetes
Information Technology
Machine Learning Operations
Terraform
Oracle Cloud Infrastructure

Job description

Lenovo is seeking a senior technical leader to guide the strategy, architecture, and delivery of our next generation Hybrid AI Platform. In this role, you will provide leadership across AI infrastructure, MLOps, cloud native platform engineering, and operational excellence -setting direction for teams that build and run production grade AI/ML platforms on Kubernetes. You will drive the vision for scalable, secure, and reliable AI systems while partnering closely with engineering, product, and executive stakeholders. If you are passionate about leading high impact AI platform initiatives, mentoring engineering talent, and shaping enterprise wide Hybrid AI capabilities, we invite you to join us., AI Platform Engineering & Operations

  • Provide technical leadership and architectural direction for Kubernetes/OpenShift based AI/ML platform design, scalability strategy, security posture, and operational standards.
  • Oversee platform roadmap, ensuring alignment with Lenovo's broader Hybrid AI strategy and enterprise architecture principles.
  • Lead engineering teams in implementing GitOps driven, cloud native platform automation using ArgoCD and Helm.
  • Set standards for Linux systems management, platform hardening, and operational reliability across all AI infrastructure.

MLOps & Model Lifecycle Management

  • Define and evolve the enterprise MLOps architecture, enabling reproducible, automated, and governed AI model workflows.
  • Lead teams in building and optimizing ML pipelines using Kubeflow Pipelines, Tekton, and Python SDKs.
  • Architect scalable, production ready model serving solutions using KServe, Knative, and Triton (where applicable).
  • Champion consistency in model registry usage, metadata management, workflow orchestration, and ML lifecycle governance.

Automation, Observability & Reliability

  • Develop the long term automation and platform SRE strategy, including Python/Ansible based automation and Terraform driven IaC patterns.
  • Establish observability standards for AI/ML systems using Prometheus, Grafana, AlertManager, and related tooling.
  • Oversee capacity planning, performance engineering, incident response processes, and continuous reliability improvements.
  • Drive adoption of automation first principles to reduce operational overhead and improve engineering velocity.

Cloud & Infrastructure Integration

  • Own the multi cloud and hybrid cloud integration strategy across AWS, GCP, Azure, and on premises environments.
  • Direct the design of enterprise grade identity and security integrations (Azure AD, LDAP, RBAC, secrets management).
  • Partner with cloud, security, and networking leadership to ensure the AI platform meets enterprise compliance and governance requirements.

Collaboration & Customer Success

  • Act as a senior point of technical escalation for internal teams and critical customer deployments.
  • Influence cross functional strategy across AI engineering, DevOps, data science, and product teams.
  • Mentor staff engineers and up level the team's capabilities through architectural reviews, technical coaching, and leadership by example.
  • Represent the platform's strategy and progress to leadership stakeholders, ensuring alignment with business goals and customer needs.

Requirements

Do you have a valid Dangerous Goods Driver's License license?, Do you have experience in Terraform?, Do you have a Master's degree?, * Bachelor's degree in Computer Science, Engineering, or related field (Master's preferred).

  • 10+ years of experience in DevOps, cloud native platform engineering, or AI/ML platform operations, including leadership or architectural responsibility.
  • Proven expertise in Kubernetes/OpenShift platform leadership, including cluster lifecycle management, operator design, advanced networking, and platform level security.
  • In depth experience with GitOps at scale using ArgoCD, Helm, and automated cluster configuration patterns.
  • Advanced knowledge of MLOps tooling (e.g., KServe, Kubeflow, Tekton, Knative) and ML workflow automation.
  • Strong proficiency in Python, Bash, and automation frameworks like Ansible and Terraform.
  • Deep experience with AWS, GCP, Azure, and hybrid cloud architectural patterns.
  • Strong observability leadership experience with Prometheus, Grafana, and distributed system monitoring.
  • Exceptional communication, stakeholder management, and cross functional leadership skills.
  • Proven track record shaping technical strategy, influencing engineering culture, and delivering complex, large scale platforms.

Bonus Points

  • Experience leading initiatives within the Red Hat OpenShift AI ecosystem.
  • Knowledge of enterprise scale LLM and model serving architectures (e.g., Triton ensemble models, OCI artifact based LLM deployments).
  • Advanced industry certifications such as CKA, CKS, GCP ACE, AWS SAA/SA Pro, or Red Hat OpenShift specializations.
  • Experience guiding data engineering or AI/ML workflow orchestration teams.
  • Demonstrated leadership in monorepo based CI/CD modernization initiatives.
  • Experience implementing and governing Internal Developer Portals (e.g., Backstage) across large engineering organizations.

Benefits & conditions

What we offer:

  • Opportunities for career advancement and personal development
  • Access to a diverse range of training programs
  • Performance-based rewards that celebrate your achievements
  • Flexibility with a hybrid work model (3:2) that blends home and office life
  • Electric car salary sacrifice scheme
  • Life insurance

About the company

Why Work at Lenovo We are Lenovo. We do what we say. We own what we do. We WOW our customers. Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY). This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub., The Lenovo AI Technology Center (LATC)-Lenovo's global AI Center of Excellence-is driving our transformation into an AI-first organization. We are assembling a world-class team of researchers, engineers, and innovators to position Lenovo and its customers at the forefront of the generational shift toward AI. Lenovo is one of the world's leading computing companies, delivering products across the entire technology spectrum, spanning wearables, smartphones (Motorola), laptops (ThinkPad, Yoga), PCs, workstations, servers, and services/solutions. This unmatched breadth gives us a unique canvas for AI innovation, including the ability to rapidly deploy cutting-edge foundation models and to enable flexible, hybrid-cloud, and agentic computing across our full product portfolio. To this end, we are building the next wave of AI core technologies and platforms that leverage and evolve with the fast-moving AI ecosystem, including novel model and agentic orchestration & collaboration across mobile, edge, and cloud resources. This space is evolving fast and so are we. If you're ready to shape AI at a truly global scale, with products that touch every corner of life and work, there's no better time to join us.

Apply for this position