GenAI Platform / LLM Inference Optimization Engineer (Cloud)

Infosys
Charlotte, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Charlotte, United States of America

Tech stack

Artificial Intelligence
Business Analytics Applications
Data analysis
Azure
Big Data
Google BigQuery
Cloud Computing
Continuous Integration
Information Engineering
DevOps
Hadoop
Python
Key Management
Knowledge Management
Machine Learning
Openshift
Reliability Engineering
Prometheus
SAS (Software)
Software Engineering
Systems Integration
Data Logging
Scripting (Bash/Python/Go/Ruby)
Large Language Models
Snowflake
Grafana
Deep Learning
Kubernetes
Data Analytics
Hashicorp
Machine Learning Operations
TensorRT
Decoding
Terraform

Job description

In the assigned Job Role of Data Science Consultant 2, your Area Of Responsibility will be as below:

  • Develop data preparation tasks, while identifying patterns or anomalies.
  • Ensure data readiness for advanced modeling.
  • Develop models for complex use cases (e.g., forecasting models, LLM-based solutions), while refining algorithms to meet business needs, and ensure smooth deployment into scalable, production-ready solutions.
  • Conduct testing and optimize algorithms for performance, reliability, and scalability, while providing guidance to team members in best practices.
  • Design and develop predictive models and data-driven analyses to address business challenges.
  • Build, evaluate, and deploy models, standardize code, and contribute to knowledge management.
  • Leverage tools like SAS and R/Python to create reusable customizations for non-ML, ML, and deep learning algorithms, while enhancing analytics including LLMs, and create innovative, cost-effective solutions.
  • Define analytics problems for projects; execute visualization, analysis, and predictive modeling under guidance.
  • Proactively maintain models and implement improvements for accuracy and reliability.
  • Apply governance controls to mitigate risks and ensure compliance.
  • Analyze performance trends, recommend improvements, and document discrepancies for escalation.
  • Maintain comprehensive documentation standards, while participating in knowledge transfer sessions.
  • Participate in discussions with stakeholders to refine requirements, provide insights, and guide implementation of models.
  • Apply the predefined quality measurement framework at an individual task level in the project.
  • Deploy complex analytics tools or multi-system integration, while validating deployment success.
  • Participate in developing scripts or templates for repeated deployments tasks.
  • Contribute to analytic solutions, IP asset creation, and training initiatives.
  • Contribute to thought leadership such as papers, innovative non-ML, ML, deep learning or LLM models, and proofs of concepts.
  • Participate in and deliver analytics training, while contributing to content creation.
  • Provide input for segment and unit-level business plans.

Your contribution to the team:

  • Deliver scalable, high-quality analytics solutions aligned to business needs.

  • A knack for optimization, deployment and performance improvement of models.

  • The ability to drive innovation through advanced analytics, automation and thought leadership.

  • Enable team growth through knowledge sharing, training and standardization.

  • Support business planning with data-driven insights., * vLLM, TensorRT-LLM, Triton, SGLang

  • Quantization (FP8/AWQ/GPTQ), tensor parallelism

  • Performance benchmarking & tuning

  • Kubernetes, GKE, KServe / ML serving patterns

  • Helm, Operators

  • GPU orchestration concepts and scheduling patterns

  • GCP and/or Azure (strong hands-on)

  • Terraform

  • Cloud networking, landing zones, governance/org policies

  • HashiCorp Vault (secrets management) Observability & SRE

  • Prometheus/Grafana, logging, tracing

  • SRE/SLO mindset, reliability engineering

Preferred Skill and Experience

Experience in Big Data technologies (e.g., BigQuery, Hadoop). Expertise in ML model development, data engineering, and software engineering principles. Knowledge of MLOps and AI/ML deployment (e.g., SageMaker, Snowflake). Familiarity with CI/CD, DevOps, and automation tools in AI/ML contexts.

  • Design and implement LLM inference serving stacks using: o vLLM, TensorRT-LLM, Triton Inference Server, SGLang o Inference optimization techniques: continuous batching, speculative decoding, KV/prefix caching o Quantization: FP8 / AWQ / GPTQ and tuning for GPU utilization

  • Build Kubernetes-based serving platforms: o KServe, Kubernetes ML Serving, GKE, OpenShift (OCP) (where applicable)

  • Enable GenAI platforms and RAG use cases: o Integrate LLM services with RAG pipelines o Provide reusable internal libraries, templates, and developer enablement assets

  • Collaborate with cross-functional teams and client stakeholders to productionize LLM workloads at scale

Requirements

  • Bachelor's degree or foreign equivalent required from an accredited institution. Will also consider three years of progressive experience in the specialty in lieu of every year of education.
  • This position may require relocation and/or travel to work/project location.
  • Candidates authorized to work for any employer in the United States without employer-based visa sponsorship are welcome to apply. Infosys is unable to provide immigration sponsorship for this role now or in the future.

Benefits & conditions

Along with competitive pay, as a full-time Infosys employee you are also eligible for the following benefits:

  • Medical/Dental/Vision/Life Insurance
  • Long-term/Short-term Disability
  • Health and Dependent Care Reimbursement Accounts
  • Insurance (Accident, Critical Illness , Hospital Indemnity, Legal)
  • 401(k) plan and contributions dependent on salary level
  • Paid holidays plus Paid Time Off

About the company

Infosys is a global leader in next-generation digital services and consulting. We enable clients in more than 50 countries to navigate their digital transformation. With over four decades of experience in managing the systems and workings of global enterprises, we expertly steer our clients through their digital journey. We do it by enabling the enterprise with an AI-powered core that helps prioritize the execution of change. We also empower the business with agile digital at scale to deliver unprecedented levels of performance and customer delight. Our always-on learning agenda drives their continuous improvement through building and transferring digital skills, expertise, and ideas from our innovation ecosystem., The Infosys Data and Analytics (DNA) unit is at the forefront of transforming data into actionable insights, driving business growth and operational efficiency. We specialize in leveraging advanced AI and analytics to create innovative solutions that address complex business challenges. Our team is dedicated to pioneering the future of data-driven decision-making, enabling organizations to unlock new opportunities and achieve sustainable success. Join us to be part of a dynamic team that is revolutionizing the way businesses harness the power of data and AI. At Infosys DNA, you'll have the opportunity to work with cutting-edge technologies, collaborate with industry experts, and contribute to transformative projects that shape the future of business. We are committed to fostering a culture of continuous learning and growth, ensuring that our team members thrive in a dynamic and supportive environment. If you're passionate about AI and eager to make a significant impact, the Infosys DNA unit is the perfect place for you to grow and excel.

Apply for this position