Senior AI Engineer (US)

Assail, Inc.

Boston, United States of America

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Boston, United States of America

Tech stack

API

Artificial Intelligence

Software System Penetration Testing

Code Review

Mobile Application Software

Python

Open Source Technology

Open Web Application Security

Red Team (Cyber Security)

Reverse Engineering

Web Applications

PyTorch

Large Language Models

Multi-Agent Systems

Software Security

Kubernetes

HuggingFace

TensorRT

Data Generation

Job description

The Senior AI Engineer is a core builder on the team responsible for the agents and models that power Ares - Assail's autonomous offensive security platform for APIs, web applications, and mobile applications. This role works directly on Ares' named-agent architecture (Polemos, Hermes, Enyo, Momos, Dolos, Themis, Aletheia, Argus, Kratos), the model powering Ares, and the Javelin co-evolutionary self-training loop. The engineer will ship capabilities that move the platform forward across exploit chaining, multimodal vision, mobile coverage, self-improvement, and customer-facing accuracy., * Agent development. Design, implement, and continuously improve the behavior and prompting of Ares' named agents, including orchestration patterns, hand-offs, planning loops, tool use, and shared memory.

Model training and fine-tuning. Contribute to the model powering Ares across data curation, SFT, preference optimization (DPO/GRPO-style), and evaluation. Own pieces of the training pipeline from dataset construction through eval.
Javelin loop. Extend the co-evolutionary self-training system that lets Ares learn from its own engagements and improve over time.
Self-improvement systems (ARES-420 and successors). Build false-positive detection, tiered skill learning (suppression rules, agent directives, code-patch proposals), and the infrastructure that routes proposed changes through human approval and back into the platform.
Evals. Design rigorous, security-specific evaluations covering OWASP Top 10 coverage, exploit chaining, finding accuracy, and agent reliability. Track performance over every model and agent change.
Multimodal and platform expansion. Contribute to vision capabilities, mobile (iOS/Android) coverage, and BYOK support shipping in Sidewinder and beyond.
Production reliability. Own latency, cost, observability, and failure-mode analysis for agents running in customer engagements. Partner with the platform team on Kubernetes-based deployment.
Customer-facing accuracy. Contribute to the live accuracy gauge and other surfaces where model and agent quality is exposed to customers.

Requirements

Do you have experience in Testing and evaluation?, * 5+ years building production ML/AI systems, with at least 2 years working directly on LLMs or LLM-powered agents.

Deep Python; strong, production-grade engineering practices (testing, code review, observability).
Hands-on fine-tuning experience: SFT, preference optimization (DPO, GRPO, RLHF/RLAIF), data curation, and synthetic data generation.
Strong grasp of transformer architectures and the modern training stack (PyTorch, Hugging Face, DeepSpeed or FSDP, accelerate).
Experience designing and shipping multi-agent or tool-using LLM systems in production - not just demos.
Rigorous eval design: building harnesses, tracking experiments, and making model/agent decisions based on data rather than vibes.
Inference optimization experience: vLLM or TensorRT-LLM, quantization, throughput/latency tradeoffs.
Comfort with retrieval pipelines, vector stores, and structured memory for agents.
Kubernetes and containerized deployment fluency.
Genuine interest in offensive security and the ability to ramp quickly on OWASP Top 10, API security, web app pentesting, and mobile pentesting concepts. Direct offensive security background is a strong plus but not required., * Offensive security background: OSCP/OSWE/OSWA, CTF, bug bounty, or prior red team work.
Research publications at NeurIPS, ICML, ICLR, USENIX Security, IEEE S&P, Black Hat, or DEFCON.
Open source contributions to agent frameworks or LLM tooling.
Experience with adversarial ML or red-teaming AI systems.
Familiarity with mobile app reverse engineering or binary analysis.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all