UKIMEA Advisory and Professional Services Sovereign AI Enterprise Architect
Role details
Job location
Tech stack
Job description
We are seeking a Sovereign AI Enterprise Architect to support strategic UKIMEA customers in designing, deploying, and scaling secure, high-performance AI platforms. This role sits at the intersection of AI infrastructure, HPC, Kubernetes, and enterprise architecture, helping sovereign and regulated organisations build resilient, compliant AI environments.
You will act as a trusted technical advisor, working closely with customers, partners, and internal teams to architect end-to-end AI solutions-from bare metal and GPUs through to orchestration, operations, and governance.
Key Responsibilities
- Design and architect sovereign AI platforms for enterprise and public sector customers
- Lead end-to-end AI infrastructure deployments, from design through implementation
- Advise on Kubernetes-based AI platforms, GPU clusters, and HPC integrations
- Partner with stakeholders to translate business requirements into scalable architectures
- Support solution validation, performance tuning, and operational readiness
- Provide technical leadership across customer engagements and advisory projects
Required Technical Experience
Container Platforms & Automation
- Strong, demonstrated experience deploying and configuring enterprise Kubernetes platforms, including:
- Rancher RKE2
- Red Hat OpenShift
- CNCF-compliant Kubernetes, * High-performance storage (NVMe, parallel file systems, object storage)
- Data centre infrastructure (power, cooling, racks, redundancy)
- Advanced networking (InfiniBand, RoCE, RDMA, 100-800GbE fabrics)
- Virtualisation and containerisation (Docker, Kubernetes, OpenShift)
- Infrastructure as Code: Terraform, Ansible, Pulumi
AI & MLOps
- AI training and inference pipelines
- Model lifecycle management and MLOps platforms
- Data pipeline orchestration (Airflow, Kubeflow)
- Performance benchmarking and workload profiling
- Large-scale model deployment (on-prem, edge, hybrid cloud)
Cloud, Security & Compliance
Cloud & Hybrid
- AI services across AWS, Azure, and GCP
- Hybrid cloud design and migration strategies
- Secure connectivity between on-prem AI systems and cloud environments
- Cost optimisation for large-scale compute workloads
Security & Governance
- Zero-trust architecture principles
- Identity and Access Management (IAM)
- Data governance and privacy controls
- Secure multi-tenant AI platforms
- Regulatory compliance (e.g. ISO, SOC 2, GDPR)
Operations & Reliability
- Observability and monitoring (Prometheus, Grafana, ELK)
- SLA/SLO design for AI workloads
- Capacity planning for GPU environments
- Incident management and root cause analysis
- High availability, fault tolerance, and disaster recovery planning
Soft Skills & Ways of Working
- Strong stakeholder engagement and communication skills
- Ability to collaborate across data science, infrastructure, and vendor teams
- Creation of high-quality technical documentation and architecture diagrams
- Experience working in Agile / DevOps environments
- Strategic mindset for scaling AI platforms and "AI factories"
Why Join Us
- Work on cutting-edge sovereign AI initiatives across UKIMEA
- Influence enterprise-scale AI architectures with real-world impact
- Collaborate with leading technology partners and customers
- Be part of a high-performing Advisory & Professional Services organisation
Additional Skills: Accountability, Accountability, Active Learning, Active Listening, Assertiveness, Bias, Building Rapport, Buyer Personas, Coaching, Complex Sales, Creativity, Critical Thinking, Cross-Functional Teamwork, Customer Experience Strategy, Customer Interactions, Design Thinking, Empathy, Financial Acumen, Follow-Through, Growth Mindset, Identifying Sales Opportunities, Industry Knowledge, Intellectual Curiosity (Inactive), Long Term Planning, Managing Ambiguity {+ 6 more}
What We Can Offer You:
Health & Wellbeing
We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Requirements
- 5+ years hands-on Linux experience (RHEL, Ubuntu)
- Strong background in Ansible automation frameworks
- Experience integrating platforms using REST APIs
AI, HPC & Accelerated Computing
- Familiarity with SLURM on Kubernetes frameworks (e.g. Slinky, SUNK)
- Strong understanding of distributed systems architecture (GPU clusters, multi-node training)
- Knowledge of HPC architectures, network topologies, and high-performance storage
- Experience with GPU and accelerator platforms (NVIDIA, AMD, or custom ASICs)
- Familiarity with CUDA, NCCL, and distributed training optimisation
- Knowledge of NVIDIA AI Enterprise tooling, including BCM and DCGM