Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)

REMOTE HAND

7 days ago

Role details

Contract type

Permanent contract

Employment type

Part-time (≤ 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Tech stack

Artificial Intelligence

Python

Software Architecture

Software Engineering

Retrieval-Augmented Generation

Large Language Models

Multi-Agent Systems

Prompt Engineering

AI Platforms

Information Technology

Machine Learning Operations

Databricks

Job description

The Senior Software Engineer II - Applied AI and Evaluations role is centered on owning and improving the quality of AI agents within the SmartAssist platform. This highly technical position involves diagnosing agent failures, designing evaluation systems, and driving measurable quality improvements for orchestrators and subagents. The role is key to ensuring these AI agents meet high standards across multiple quality dimensions and directly impacts the platform's reliability and effectiveness. Collaboration with engineering and AI platform teams is essential to establish scalable methodologies and integrate quality assurance deeply into the development lifecycle., * Own end-to-end agent quality including diagnosis, improvement, and validation

Identify and prioritize failure modes in factual accuracy, completeness, tone, actionability, and latency
Improve quality through prompt engineering, context engineering, and retrieval-augmented generation tuning
Expand and mature the evaluation framework with scorers, datasets, regression gates, and production traffic evaluation
Ensure every change has a measurable, attributable quality signal
Collaborate with architecture leads to differentiate between prompt/context and structural quality issues

Requirements

8+ years of software engineering experience including 2+ years with production LLMs
Hands-on expertise in prompt and context engineering affecting model behavior
Strong knowledge of retrieval-augmented generation architectures, embedding models, and failure diagnosis
Experience creating or extending LLM evaluation frameworks, including scorers and golden datasets
Familiarity with agent system design and ability to participate in architectural decisions affecting quality
Proficient in Python, comfortable with data-heavy environments such as Databricks or Delta tables
Effective communication skills for conveying complex quality issues to diverse stakeholders
Strong cross-functional judgment and ability to build credibility across teams
Ability to bring clarity and structure in ambiguous situations
Legal eligibility to work in the U.S. on an ongoing basis
BS or MS in Computer Science, related field, or equivalent experience

Preferred:

Experience with MLflow or similar experiment tracking tools
Knowledge of CI-integrated evaluation pipelines
Experience with multi-agent orchestration frameworks

Benefits & conditions

Background in Applied AI or LLMOps within a product company

Pay Range and Compensation Package:

The pay range and compensation package for this role will be determined based on the candidate's experience, skills, and other relevant factors.

Benefits & Perks:

Employer subsidized medical, vision, and dental coverage for full-time employees
401k match of 50% on contributions up to 6% of eligible pay
Monthly stipend to support work and productivity
Flexible Time Away Program and Sick Time Off
Life insurance, short-term, and long-term disability plans for U.S. employees
12 paid holidays annually for U.S. employees
Up to 24 weeks of Parental Leave
Personal paid Volunteer Day
Professional growth opportunities and access to Udemy online courses
Company-funded perks including counseling membership and local discounts

About the company

The organization operates in the AI-powered work management industry, focusing on advancing intelligent agent platforms to enhance team productivity. It addresses the challenge of scaling AI agents from early prototypes to production-ready systems, emphasizing quality as a critical factor. The program develops solutions that automate manual tasks, uncover insights, and support scalable work management, impacting teams seeking smarter workflows. Their SmartAssist platform represents the next generation of AI-driven work management, targeting real-world application and continuous improvement at scale.