ML Engineer (LLM Systems)

CYNNOVATIVE, LLC

8 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Tech stack

API

Amazon Web Services (AWS)

Azure

Continuous Integration

Software Debugging

Distributed Systems

Memory Management

Fault Tolerance

Python

Machine Learning

TensorFlow

Azure

Software Engineering

Strategies of Testing

Data Logging

PyTorch

Large Language Models

Caching

GIT

Build Management

Containerization

Kubernetes

Information Technology

Low Latency

HuggingFace

Machine Learning Operations

Software Version Control

Docker

Job description

As a Senior ML Engineer (LLM Systems) at Cynnovative, you will be responsible for developing and managing tools that facilitate LLM experimentation and deployment. This role is crucial in ensuring seamless integration and operation of machine learning models in various environments, supporting U.S. national security efforts.

NOTE: This role requires an active TS/SCI security clearance and is located on-site in Northern Virginia.

Responsibilities May Include

Design and build scalable LLM systems for high-throughput experimentation and inference

Optimize inference performance (latency, throughput)

Batching, caching, and request scheduling
Efficient GPU/CPU utilization and memory management

Design and deploy containerized ML services (e.g., Docker, Kubernetes)

Lead development of experimentation infrastructure

Build frameworks for large-scale experiment sweeps and parallel execution
Support distributed execution across compute environments
Ensure fault tolerance, retry logic, and reproducibility

Ensure production readiness and operational reliability of LLM systems

Implement testing strategies and validation pipelines
Design APIs and model serving systems
Support deployment across dev/staging/prod environments
Maintain observability (logging, monitoring, tracing)
Debug and resolve issues in production systems
Deploy systems in secure or constrained environments

Collaborate cross-functionally

Work closely with applied mathematicians and research engineers
Provide technical leadership and mentorship
Establish engineering best practices

Requirements

B.S. in Computer Science, Software Engineering, or related field (M.S. or Ph.D. preferred)
Strong communication skills and cross-functional collaboration
Deep understanding of transformer architectures and LLM inference workflows
Hands-on experience building scalable ML systems
Proficiency in Python and ML frameworks (e.g., PyTorch, Hugging Face)
Experience with distributed systems or large-scale compute environments
Experience with containerization and cloud platforms (Docker, Kubernetes, AWS/GCP/Azure)
Familiarity with CI/CD workflows for ML systems
Experience with version control systems (Git)
U.S. Citizenship and active TS/SCI security clearance, * Experience with high-performance LLM inference frameworks (vLLM, SGLang, etc.)
Experience deploying real-time inference APIs
Understanding of ML system tradeoffs (latency vs throughput vs cost)
Experience bridging research and production systems
Familiarity with cyber-related data, tools, and techniques

About the company

At Cynnovative, we leverage machine learning, computer science, and software engineering to address high-impact problems in the cyber domain, specifically those which are critical to U.S. national security. We primarily extend fundamental research to invent, design, develop, and deploy prototype solutions that support persistent problems in this domain.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all