AI Infrastructure & Experience Engineer

DGN Technologies

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Tech stack

Artificial Intelligence

Systems Engineering

C++

Program Optimization

Communications Protocols

Nvidia CUDA

Software Debugging

Design of User Interfaces

Python

Local Area Networks

Linux kernel

Machine Learning

Rapid Prototyping Process

TensorFlow

Next.js

Software Engineering

System Programming

WebSocket

Rust

React

Large Language Models

Caching

FastAPI

Containerization

Kubernetes

Information Technology

Low Latency

Search Engines

Front End Software Development

TensorRT

Asynchronous Programming

Api Design

REST

gRPC

Docker

Requirements

Inference Optimization: Deploy and tune multiple LLMs and generative multimodal models on local inference hardware. Optimize performance metrics (TTFT, tokens/sec) via model quantization, caching strategies, and architecture-specific adjustments.

Systems Engineering & CUDA: Leverage deep knowledge of the CUDA environment to build custom kernels, ensuring maximum utilization of the low cost GPU compute.

Orchestration & Integration: Seamlessly bridge inference backends with orchestration layers (LiteLLM, Ollama, etc.) and frontends like OpenWebUI.

Rapid Prototyping: Build functional, high-fidelity demos showcasing model memory capabilities, agentic workflows, and context-aware web search.

Peripheral Connectivity: Implement communication protocols to bridge local AI compute with peripheral devices, including smart TVs, household appliances, and XR hardware.

Skills: Technical Qualifications

Recent experience in model optimization required Hardware & Compute: Proven experience with NVIDIA eco-systems and ARM64 architecture.

Systems Programming: Advanced proficiency in C++, Python, and Rust. Deep familiarity with CUDA and the ability to author/debug custom CUDA kernels for compute-intensive tasks.

AI/ML Frameworks: Extensive experience with modern inference engines (llama.cpp, TensorRT-LLM, Ollama) and orchestration frameworks (LiteLLM).

Software Engineering: Robust understanding of asynchronous programming (FastAPI), containerization (Docker/Kubernetes), sandbox environments, and API design for low-latency communication.

Full-Stack Prototyping: Ability to quickly spin up modern frontend UIs (React, Next.js, or similar) to present AI-driven intelligence to end users.

Communication Protocols: Familiarity with WebSockets, gRPC, and REST for device-to-device communication in a local network environment.

Keywords:

Education: Ideal Candidate Profile

The "Builder" Mindset: You are energized by the prospect of building proofs-of-concept in days rather than months. You thrive in environments where speed and creativity are paramount.

Problem Solver: You approach unsolved, messy engineering challenges with enthusiasm rather than trepidation.

Architectural Vision: You see the "big picture" of how AI becomes part of the consumer's daily life, not just how the model generates text.

Agile & Adaptable: You are comfortable working in a fast-paced environment where priorities shift based on the results of rapid experimentation.

Degree in Computer Science, Machine Learning or Artificial Intelligence Specialization preferred, but not required

3 years of relevant industry experience required

Skills and Experience:

Required Skills:

INFERENCE OPTIMIZATION

NVIDIA ECOSYSTEMS

CUSTOM CUDA KERNEL DEVELOPMENT

ARM64 ARCHITECTURE, API DESIGN

LOW-LATENCY COMMUNICATION

FRONTEND UI DEVELOPMENT

REACT

NEXT.JS

WEBSOCKETS

GRPC

REST

DEVICE-TO-DEVICE COMMUNICATION

PROBLEM SOLVING

ARCHITECTURAL VISION

AGILITY

ADAPTABILITY

Role details

Job location

Tech stack

Requirements

Apply for this position

Good distractions

Moments

Videos View all