Senior Software Engineer in Test, AI Engineering

TWG, INC.

Santa Monica, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 190K

Job location

Santa Monica, United States of America

Tech stack

Clean Code Principles

Java

API

Artificial Intelligence

Automation of Tests

Azure

Code Review

Continuous Integration

Software Debugging

Github

Java Web Services

Python

Object-Oriented Software Development

Next.js

Selenium

Software Engineering

SQL Databases

Software Testing Automation Framework

Systems Integration

Test Data

CircleCI

Data Processing

Large Language Models

Model Validation

Cypress

Build Management

Pytest

Containerization

Gitlab-ci

Information Technology

Playwright

Production Code

Docker

SDET

Jenkins

Data Generation

Job description

TWG Global is seeking a Senior AI Software Engineer in Test to join our AI Engineering team building commercial-grade AI products. This is a software engineering role focused on test automation. You won't just write test cases, you'll design and build the frameworks, harnesses, evaluation infrastructure, and tooling that make testing AI agents and LLM-powered applications possible at scale.

Our agents are written in LangGraph and run on Azure on the TWG side, with a parallel Vercel-based stack on the Palantir side. You'll write eval sets against both, and you'll validate the surfaces our users actually touch: iOS apps, plugins, and Chrome extensions, not just the model layer., Framework and harness engineering

Design and build scalable, reusable test automation frameworks for AI agents, LLM-powered applications, and underlying APIs.
Write clean, maintainable Python for test harnesses, eval pipelines, synthetic data generation utilities, and internal tooling.
Treat test code as production code: code review, type hints, documentation, library design.

Evaluation infrastructure

Build evaluation infrastructure for benchmarking agent performance against SOTA LLMs, competitors, and internal baselines.
Own regression suites, golden datasets, rubric-based evals, and metric dashboards.
Build tooling for synthetic test data generation, edge-case discovery, and adversarial testing.

Resilience and load

Design and run release, system, performance, and load tests against streaming, stateful, and async systems.
Build chaos and fault injection tooling for token expiry, connection pool exhaustion, provider failover, and cache pressure scenarios.
Drive contract testing across LLM providers (Bedrock, Anthropic, OpenAI) to catch parity drift.

CI/CD and observability

Integrate automated tests into CI/CD so every model, prompt, and code change is validated before it ships.
Build trace-based assertions on LangGraph state, tool calls, and agent decisions - debugging an agent failure means replaying graph state, not re-running a prompt.
Make observability a first-class testing surface (LangSmith, audit logs).

Human-in-the-loop and partnership

Implement HIL review workflows where automation alone cannot validate quality, then push the automation boundary outward.
Partner with AI engineers and data scientists on model evaluation, training and eval data prep, and root-cause debugging of complex end-to-end failures.
Champion quality engineering practices across the team: code review, coverage standards, observability, reproducibility.
Ensure user-centric validation so AI outputs are accurate, reliable, and meet real-world application needs.

Requirements

Do you have experience in iOS?, You'll work shoulder-to-shoulder with AI engineers and data scientists, contributing production-quality code to shared repositories. The ideal candidate is a strong coder, fluent in Python and Java - who has shipped automated test infrastructure in a production environment and has hands-on experience evaluating LLM and agentic systems., * 3-7 years of software engineering experience, with a meaningful portion focused on test automation, SDET, or software engineering in test roles.

Expert-level Python. You write Python every day, design libraries other engineers use, and apply OOP and clean-code practices.
Hands-on Java experience, enough to read, write, and test Java services, not just touch them.
Working understanding of the LangGraph or Vercel frameworks: graph state, nodes, edges, tool calls, and how to write evals against agentic flows.
Demonstrated experience building eval sets for LLM models (this is critical to the role).
Experience testing across multiple client surfaces: iOS apps, plugins, and Chrome extensions.
Hands-on experience building automated test suites with frameworks such as pytest, Selenium, Playwright, Cypress, or similar.
Proven experience integrating test automation into CI/CD systems (GitHub Actions, Jenkins, CircleCI, GitLab CI, or similar).
Strong skills in data manipulation, test data preparation, and SQL.
Bachelor's degree or higher in Computer Science, Engineering, or a related field.

Strongly preferred:

Experience with Azure (our primary cloud) and containerization (Docker).
Experience testing RAG pipelines, agentic workflows, or multi-step tool-calling systems.

Benefits & conditions

The base pay for this position is $160,000-$190,000. A bonus will be provided as part of the compensation package, in addition to a full range of medical, financial, and/or other benefits.

About the company

At TWG Group Holdings, LLC ("TWG Global"), we drive innovation and business transformation across a range of industries-including financial services, insurance, technology, media, and sports-by leveraging data and AI as core assets. Our AI-first, cloud-native approach delivers real-time intelligence and interactive business applications, empowering informed decision-making for both customers and employees. We prioritize responsible data and AI practices, ensuring ethical standards and regulatory compliance. Our decentralized structure enables each business unit to operate autonomously, supported by a central AI Solutions Group, while strategic partnerships with leading data and AI vendors fuel game-changing efforts in marketing, operations, and product development. You will collaborate with management to advance our data and analytics transformation, enhance productivity, and enable agile, data-driven decisions. By leveraging relationships with top tech startups and universities, you will help create competitive advantages and drive enterprise innovation. At TWG Global, your contributions will support our goal of sustained growth and superior returns, as we deliver rare value and impact across our businesses.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all