Senior Principal HPC/AI Architect

NTT Ltd.
Municipality of Madrid, Spain
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Municipality of Madrid, Spain

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Computer Vision
Ethernet
General Parallel File Systems
InfiniBand
Openshift
Red Hat Enterprise Linux - RHEL
Network Switches
Graphics Processing Unit (GPU)
Large Language Models
Deep Learning
Kubernetes
Docker
Nvme

Job description

We are seeking a highly skilled and visionary Senior Principal HPC/AI Infrastructure Architect to lead the technical design and architecture of large-scale AI infrastructure solutions. This role is pivotal in shaping next-generation AI Factories, supporting customer engagements, and driving technical excellence across compute, interconnect, and software stack domains., 1. AI Factory Architecture & Design (35%)

  • Design GPU cluster architectures tailored for AI and HPC workloads.
  • Define node configurations for diverse workload types including dense GPU nodes, cost-optimized nodes, and high-memory CPU nodes.
  • Specify and validate performance metrics including compute throughput, memory bandwidth, and power consumption.
  • Architect multi-tier interconnect networks using NVLink, InfiniBand, and high-speed Ethernet.
  • Develop topology designs and calculate bandwidth and latency targets.
  • Model performance for customer workloads and validate against industry benchmarks.
  1. Pre-Sales Technical Leadership (30%)
  • Lead technical discussions with customer architects and stakeholders.
  • Conduct workload sizing and architectural presentations.
  • Develop technical content for proposals including BoMs, compliance matrices, and scoring alignment.
  • Analyze competitor solutions and articulate technical differentiators.
  1. Demonstrator Lab Development (20%)
  • Design and expand lab infrastructure for AI workload testing and validation.
  • Build reference architectures across industries such as finance, manufacturing, healthcare, and research.
  • Support lab operations including cluster configuration, workload orchestration, and software stack maintenance.
  1. Customer Demonstrations & PoCs (10%)
  • Deploy and showcase customer-specific AI workloads including LLM training, computer vision, and scientific simulations.
  • Manage proof-of-concept projects, define success criteria, and present outcomes to stakeholders.
  1. Technical Expertise & Innovation (5%)
  • Maintain relationships with key technology vendors and participate in early access programs.
  • Evaluate emerging technologies and contribute to innovation roadmaps and adoption strategies.

Requirements

Technical Competencies

  • GPU Architectures: NVIDIA (H100, H200, B100, B200), AMD (MI300X), Intel (Gaudi2/3)

  • Interconnects: InfiniBand (HDR/NDR/XDR), NVLink, RoCE, Infinity Fabric

  • Storage Systems: Lustre, GPFS, BeeGFS, NVMe-oF, S3-compatible object storage

  • Container Platforms: Kubernetes, Docker, Singularity/Apptainer

  • Performance Tools: NVIDIA Nsight, ROCm, Intel VTune Certifications (Preferred)

  • NVIDIA Deep Learning Institute (DLI)

  • Red Hat Certified Specialist in OpenShift

  • InfiniBand Certified Professional Experience

  • 8+ years in HPC/AI infrastructure design

  • 5+ years working with GPU-accelerated systems

  • Proven experience with large-scale GPU deployments (1000+ GPUs)

  • Successful track record in technical bid support and customer engagement

About the company

NTT DATA is a $30+ billion business and technology services leader, serving 75% of the Fortune Global 100. We are committed to accelerating client success and positively impacting society through responsible innovation. We are one of the world's leading AI and digital infrastructure providers, with unmatched capabilities in enterprise-scale AI, cloud, security, connectivity, data centers and application services. Our consulting and industry solutions help organizations and society move confidently and sustainably into the digital future. As a Global Top Employer, we have experts in more than 50 countries. We also offer clients access to a robust ecosystem of innovation centers as well as established and start-up partners. NTT DATA is part of NTT Group, which invests over $3 billion each year in R&D.

Apply for this position