LLM Inference & GPU Systems Consultant

NTT DATA, Inc.
Charlotte, United States of America
14 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charlotte, United States of America

Tech stack

Artificial Intelligence
Systems Engineering
Openshift
Software Deployment
AI Infrastructure
Large Language Models
Kubernetes
Optimization Algorithms
HuggingFace
Machine Learning Operations
TensorRT
Decoding

Job description

  • NVIDIA GPU Runtime Optimization: Drive extreme runtime efficiency and optimization for the token generation pipeline. Specifically manage prefill/decode optimization and KV cache management.
  • Inference Serving: Deploy and manage inference engines including vLLM and TensorRT-LLM.
  • Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration using RunAI and Kubernetes GPU orchestration.
  • Model Lifecycle Management: Oversee the complete Hugging Face model lifecycle, including model onboarding, deployment, and retirement.
  • Platform Operations: Operate and maintain the OpenShift AI ecosystem as the primary container platform for GenAI workloads.

Requirements

Do you have experience in Software deployment?, * 8 years experience working as an LLM Systems Engineer or AI Infrastructure Runtime Engineer.

  • 8 years hands-on experience with NVIDIA H200 clusters and runtime optimization techniques (KV Cache, prefill/decode).
  • Proficiency in OpenShift AI and GPU orchestration tools like RunAI.
  • Strong experience with modern inference frameworks, specifically vLLM and TensorRT-LLM.
  • Proven track record managing the Hugging Face deployment lifecycle.
  • Must be onsite at client in Charlotte, NC at least 3 days/week.

About the company

NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long-term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com.

Apply for this position