AI Engineer: Agentic Systems & On-Premise Infrastructure - MADISON WISCONSIN, NO REMOTE

EVO TECHNOLOGY INC

Madison, United States of America

14 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 84K

Job location

Madison, United States of America

Tech stack

Artificial Intelligence

Systems Engineering

Computer Clusters

Python

Performance Tuning

Retrieval-Augmented Generation

Large Language Models

Multi-Agent Systems

State Machines

Backend

FastAPI

Kubernetes

Bare Metal

Build Tools

TensorRT

Hardware Infrastructure

Job description

As a Senior AI Engineer, you will own the end-to-end lifecycle of our internal AI ecosystem-from optimizing local inference engines to orchestrating multi-agent swarms. You will build systems that don't just "chat," but reason, execute tools, and maintain state over long-running business processes. This is a role for a "full-stack" AI engineer: someone who understands both the mathematical foundations of LLMs and the systems engineering required to run them at scale on-premise., * Agentic Orchestration: Architect multi-agent systems using LangGraph or PydanticAI, implementing advanced patterns like Plan-and-Execute, Self-Reflection, and Multi-Agent Handoffs.

Production RAG: Build high-performance Retrieval-Augmented Generation pipelines utilizing hybrid search, cross-encoders, and re-ranking within our private vector stores.
On-Premise Optimization: Manage the full inference stack. You will be responsible for optimizing model performance (latency vs. throughput) on local GPU clusters using vLLM, TGI, or TensorRT-LLM.
Stateful Architecture: Design resilient workflows that handle long-running tasks, error recovery, and complex "human-in-the-loop" interactions.
Evaluation & Benchmarking: Develop rigorous, automated "LLM-as-a-judge" frameworks to evaluate agentic performance, ensuring reliability and grounding.

Requirements

Agentic Frameworks: Expert-level mastery of LangGraph (state machines/persistence) or PydanticAI (type-safe logic/structured outputs).
Python Mastery: Deep experience in asynchronous Python and building scalable, production-grade backend services (FastAPI, Pydantic).
Infrastructure & Deployment: Proven experience deploying open-weights models (Llama, Mistral, DeepSeek) on bare-metal or private Kubernetes environments.
The "Inner Workings": A strong grasp of LLM fundamentals, including attention mechanisms, KV caching, and the impact of tokenization on performance/cost., * Pretraining: Knowledge of data curation, pruning, and continued pretraining on domain-specific datasets.
Post-training: Hands-on experience with Supervised Fine-Tuning (SFT) and alignment techniques like DPO, ORPO, or RLHF.
Model Compression & Efficiency: Deep knowledge of quantization (GGUF, AWQ, EXL2), model distillation, and graph optimization to squeeze maximum performance out of local hardware.
VRAM Management: Ability to perform "VRAM math"-calculating memory requirements for various model sizes (e.g., 70B vs 8B) relative to context window length, batch sizes, and KV cache pressure.
Data Sovereignty: A passion for building private systems where data privacy and infrastructure control are paramount.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all