AI Application Engineer

Technogen, Inc.
San Jose, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

San Jose, United States of America

Tech stack

Artificial Intelligence
Automated Storage and Retrieval Systems
Profiling
Computer Programming
Python
Performance Tuning
Recommender Systems
Regression Testing
Software Safety
Software Engineering
Data Streaming
Chatbots
Large Language Models
Multi-Agent Systems
Prompt Engineering
Kubernetes
Information Technology
Machine Learning Operations
Nim (Programming Language)
Service Stack
ServiceNow

Job description

  • Develop and maintain multi-step LLM orchestration pipelines using LangChain, LlamaIndex, or custom frameworks
  • Implement and optimize RAG pipelines including chunking strategies, embedding selection, reranking, and hybrid search
  • Design multi-turn conversational AI experiences with context management and session memory
  • Integrate NVIDIA technologies including NIM, NeMo, NeMoGuardrails, and Riva into enterprise AI applications
  • Build automated evaluation pipelines for model quality, hallucination detection, regression testing, and release gating
  • Perform latency profiling and optimization across multi-step LLM call chains
  • Implement AI safety guardrails including prompt injection prevention, jailbreak mitigation, and topical control
  • Collaborate with globally distributed engineering and product teams to deliver scalable AI solutions
  • Support deployment, monitoring, and continuous improvement of AI applications in production environments

Requirements

  • 4-7 years of software engineering experience with at least 2 years focused on production LLM application development
  • Expert-level experience with Python for AI/ML application development and async programming
  • Strong expertise in prompt engineering including system prompts, few-shot prompting, and instruction tuning
  • 3+ Years of Hands-on experience with multi-step LLM orchestration frameworks such as LangChain or LlamaIndex
  • 3+ Years of Experience designing and optimizing RAG pipelines and retrieval systems
  • 3+ Years of Experience with vector databases, similarity search tuning, and reranking techniques
  • 3+ Years of Hands-on experience with NVIDIA NIM, NeMo, NeMoGuardrails, and Riva
  • 3+ Years of Experience implementing AI safety and guardrails for customer-facing applications
  • Strong knowledge of automated AI evaluation frameworks such as RAGAS or TruLens
  • 3+ Years of Experience profiling and optimizing latency in multi-step AI pipelines
  • Ability to work onsite in Santa Clara, CA
  • Preferred Qualifications
  • Experience with adaptive learning systems or recommendation engines
  • Knowledge graph integration experience with RAG architectures
  • Experience with multi-agent orchestration patterns
  • ServiceNow API integration experience
  • Prior experience building AI products on NVIDIA infrastructure
  • Experience with streaming LLM response handling and real-time AI applications

Technology Stack

Python

LangChain, Bachelor's degree in Computer Science, Engineering, Artificial Intelligence, or equivalent work experience.

Apply for this position