Solutions Engineer - Language Models
Role details
Job location
Tech stack
Job description
Artificial Analysis maintains one of the most comprehensive language model benchmarking suites in the industry, evaluating frontier models across quality, speed, and pricing for the AI labs and enterprises that rely on our data.
We're hiring a Solutions Engineer to own the day-to-day operation of our language model benchmarking stack. This is a hands-on, operational role: you'll add new models to our evaluation pipeline, run and debug benchmarks, and serve as the primary technical point of contact for AI lab customers - explaining results, fielding methodology questions, and resolving API endpoint issues over Slack and video calls.
This is not a software engineering role focused on building new systems. It's about running a sophisticated existing stack exceptionally well, consistently and reliably, while being the trusted technical face of Artificial Analysis to our customers.
What You'll Do
- Operate and maintain our Python-based language model benchmarking pipeline end-to-end: onboard new models, configure evaluations, execute benchmark runs, and validate results
- Debug issues across the stack - from API endpoint timeouts and errors to unexpected benchmark outputs - and resolve them quickly
- Serve as the primary technical contact for AI lab customers: communicate benchmarking results clearly, explain methodology, field technical questions, and troubleshoot integration issues via Slack and video conferencing
- Monitor benchmark runs for anomalies, investigate discrepancies, and ensure the accuracy and integrity of published results
- Maintain documentation of processes, known issues, and model-specific configurations
- Collaborate with the engineering team to flag pipeline improvements and contribute to process refinements
- Stay current with new model releases, API changes, and developments across the language model ecosystem
Requirements
Do you have experience in Video conferences (communication methods)?, Required:
- 5+ years of experience in a client-facing technical role - solutions engineering, support engineering, technical consulting, or similar (companies like Stripe, Vercel, Cloudflare, Datadog, Palantir, Accenture, or comparable)
- Strong Python proficiency and comfort working with complex codebases you didn't write
- Hands-on experience working with AI/ML model APIs (OpenAI, Anthropic, Google, Meta, etc.)
- Excellent debugging skills - you can trace issues across APIs, data pipelines, and code
- Strong written and verbal English communication skills, with the ability to explain technical concepts clearly to technical stakeholders
- Highly responsive and reliable - you take ownership of customer issues and follow through
- Comfortable with operational, repeatable work - you find satisfaction in running things well rather than building from scratch
- High attention to detail and calm under pressure
Nice to have (not required):
- Experience with AI evaluation, benchmarking, or testing methodologies
- Familiarity with LLM inference infrastructure (tokenization, latency measurement, throughput metrics)
- Experience working in or with AI labs or model providers
- Background in B2B SaaS or developer tools