Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)

REMOTE HAND
7 days ago

Role details

Contract type
Permanent contract
Employment type
Part-time (≤ 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Artificial Intelligence
Python
Software Architecture
Software Engineering
Retrieval-Augmented Generation
Large Language Models
Multi-Agent Systems
Prompt Engineering
AI Platforms
Information Technology
Machine Learning Operations
Databricks

Job description

The Senior Software Engineer II - Applied AI and Evaluations role is centered on owning and improving the quality of AI agents within the SmartAssist platform. This highly technical position involves diagnosing agent failures, designing evaluation systems, and driving measurable quality improvements for orchestrators and subagents. The role is key to ensuring these AI agents meet high standards across multiple quality dimensions and directly impacts the platform's reliability and effectiveness. Collaboration with engineering and AI platform teams is essential to establish scalable methodologies and integrate quality assurance deeply into the development lifecycle., * Own end-to-end agent quality including diagnosis, improvement, and validation

  • Identify and prioritize failure modes in factual accuracy, completeness, tone, actionability, and latency

  • Improve quality through prompt engineering, context engineering, and retrieval-augmented generation tuning

  • Expand and mature the evaluation framework with scorers, datasets, regression gates, and production traffic evaluation

  • Ensure every change has a measurable, attributable quality signal

  • Collaborate with architecture leads to differentiate between prompt/context and structural quality issues

Requirements

  • 8+ years of software engineering experience including 2+ years with production LLMs

  • Hands-on expertise in prompt and context engineering affecting model behavior

  • Strong knowledge of retrieval-augmented generation architectures, embedding models, and failure diagnosis

  • Experience creating or extending LLM evaluation frameworks, including scorers and golden datasets

  • Familiarity with agent system design and ability to participate in architectural decisions affecting quality

  • Proficient in Python, comfortable with data-heavy environments such as Databricks or Delta tables

  • Effective communication skills for conveying complex quality issues to diverse stakeholders

  • Strong cross-functional judgment and ability to build credibility across teams

  • Ability to bring clarity and structure in ambiguous situations

  • Legal eligibility to work in the U.S. on an ongoing basis

  • BS or MS in Computer Science, related field, or equivalent experience

Preferred:

  • Experience with MLflow or similar experiment tracking tools

  • Knowledge of CI-integrated evaluation pipelines

  • Experience with multi-agent orchestration frameworks

Benefits & conditions

  • Background in Applied AI or LLMOps within a product company
  1. Pay Range and Compensation Package:
  • The pay range and compensation package for this role will be determined based on the candidate's experience, skills, and other relevant factors.
  1. Benefits & Perks:
  • Employer subsidized medical, vision, and dental coverage for full-time employees

  • 401k match of 50% on contributions up to 6% of eligible pay

  • Monthly stipend to support work and productivity

  • Flexible Time Away Program and Sick Time Off

  • Life insurance, short-term, and long-term disability plans for U.S. employees

  • 12 paid holidays annually for U.S. employees

  • Up to 24 weeks of Parental Leave

  • Personal paid Volunteer Day

  • Professional growth opportunities and access to Udemy online courses

  • Company-funded perks including counseling membership and local discounts

About the company

The organization operates in the AI-powered work management industry, focusing on advancing intelligent agent platforms to enhance team productivity. It addresses the challenge of scaling AI agents from early prototypes to production-ready systems, emphasizing quality as a critical factor. The program develops solutions that automate manual tasks, uncover insights, and support scalable work management, impacting teams seeking smarter workflows. Their SmartAssist platform represents the next generation of AI-driven work management, targeting real-world application and continuous improvement at scale.

Apply for this position