Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)
Role details
Job location
Tech stack
Job description
The Senior Software Engineer II - Applied AI and Evaluations role is centered on owning and improving the quality of AI agents within the SmartAssist platform. This highly technical position involves diagnosing agent failures, designing evaluation systems, and driving measurable quality improvements for orchestrators and subagents. The role is key to ensuring these AI agents meet high standards across multiple quality dimensions and directly impacts the platform's reliability and effectiveness. Collaboration with engineering and AI platform teams is essential to establish scalable methodologies and integrate quality assurance deeply into the development lifecycle., * Own end-to-end agent quality including diagnosis, improvement, and validation
-
Identify and prioritize failure modes in factual accuracy, completeness, tone, actionability, and latency
-
Improve quality through prompt engineering, context engineering, and retrieval-augmented generation tuning
-
Expand and mature the evaluation framework with scorers, datasets, regression gates, and production traffic evaluation
-
Ensure every change has a measurable, attributable quality signal
-
Collaborate with architecture leads to differentiate between prompt/context and structural quality issues
Requirements
-
8+ years of software engineering experience including 2+ years with production LLMs
-
Hands-on expertise in prompt and context engineering affecting model behavior
-
Strong knowledge of retrieval-augmented generation architectures, embedding models, and failure diagnosis
-
Experience creating or extending LLM evaluation frameworks, including scorers and golden datasets
-
Familiarity with agent system design and ability to participate in architectural decisions affecting quality
-
Proficient in Python, comfortable with data-heavy environments such as Databricks or Delta tables
-
Effective communication skills for conveying complex quality issues to diverse stakeholders
-
Strong cross-functional judgment and ability to build credibility across teams
-
Ability to bring clarity and structure in ambiguous situations
-
Legal eligibility to work in the U.S. on an ongoing basis
-
BS or MS in Computer Science, related field, or equivalent experience
Preferred:
-
Experience with MLflow or similar experiment tracking tools
-
Knowledge of CI-integrated evaluation pipelines
-
Experience with multi-agent orchestration frameworks
Benefits & conditions
- Background in Applied AI or LLMOps within a product company
- Pay Range and Compensation Package:
- The pay range and compensation package for this role will be determined based on the candidate's experience, skills, and other relevant factors.
- Benefits & Perks:
-
Employer subsidized medical, vision, and dental coverage for full-time employees
-
401k match of 50% on contributions up to 6% of eligible pay
-
Monthly stipend to support work and productivity
-
Flexible Time Away Program and Sick Time Off
-
Life insurance, short-term, and long-term disability plans for U.S. employees
-
12 paid holidays annually for U.S. employees
-
Up to 24 weeks of Parental Leave
-
Personal paid Volunteer Day
-
Professional growth opportunities and access to Udemy online courses
-
Company-funded perks including counseling membership and local discounts