Applied Scientist - LLM, Alexa Conversational Modelling Intelligence
Role details
Job location
Tech stack
Job description
As an Applied Scientist II in the Alexa Conversational Modelling Intelligence team within Alexa AI, you will drive model post-training for Large Language Models that power Alexa+. You'll adopt and adapt state-of-the-art techniques - including supervised fine-tuning, reinforcement learning, preference optimization, and knowledge distillation - running rigorous experiments and translating findings into production-ready solutions that directly improve the customer experience for millions of users worldwide.
You will own the full model development cycle from data curation through training, evaluation, and deployment. Your day-to-day will involve developing evaluation methods and metrics, diagnosing model defects, optimizing model training pipelines, and iterating on recipes to move concrete quality and efficiency benchmarks. You'll write clean, reproducible code, contribute to shared tooling, and collaborate closely with scientists and engineers to bring models from experimentation to scale.
You are technically curious, experiment-driven, and motivated by real customer impact. You are an expert in LLM post-training. You will also advance the state of the art by publishing at top-tier NLP/ML conferences (ACL, EMNLP, NeurIPS, ICML, ICLR) - contributing to the broader research community while grounding your work in measurable outcomes.
Key job responsibilities
- Own the full model development cycle - from data curation through training, evaluation, and deployment.
- Develop and apply post-training techniques: supervised fine-tuning, reinforcement learning, preference optimization, and knowledge distillation.
- Build evaluation methods and metrics, and diagnose model defects to target the highest-impact improvements.
- Optimize model training pipelines and iterate on recipes to move concrete quality and efficiency benchmarks.
- Write high-quality documentation on methods and experiment outcomes, and communicate findings clearly to stakeholders.
A day in the life Post-training is one of the most active frontiers in LLMs right now. The field has moved from scaling pretraining to getting more out of models afterward through RL, reasoning recipes, and preference optimization. You'll work on these techniques directly, on a product used by millions of customers every day. A typical day: review overnight training runs and dashboards, dig into model defects to form hypotheses, then curate data and iterate on a recipe, improving shared tooling along the way. You'll sync with scientists and engineers to unblock the path to production, and write up your findings for stakeholders. It's fast-moving - a good idea can reach millions of customers within weeks.
About the team The Alexa Conversational Modelling Intelligence team builds industry-leading LLM-based conversational technologies that customers love. Our mission is to push the envelope in LLMs for Alexa to deliver the best-possible customer experience. As an Applied Scientist, you'll contribute directly to that mission through model development and experimentation.
Requirements
PhD in computer science, machine learning, engineering, or related fields
- Knowledge of at least one programming language such as Java, C#, JavaScript, Python, Ruby or Perl
- Experience in designing experiments and statistical analysis of results
- Hands-on experience building, training, and evaluating LLMs.
Preferred Qualifications
- Have publications on top-tier conferences, such as CVPR, ICCV, ECCV or NeurIPS
- Experience working with large, complex data sets
- Experience working effectively with science, data processing, and software engineering teams
- Experience in written and verbal communication skills to communicate with technical and non-technical audiences, including senior leadership
- Experience building and deploying LLM solutions in production or at scale.
- Hands-on experience with Large Language Models training and fine-tuning via pre-training, SFT, and/or RLHF/preference optimization.
- Experience with LLM evaluation - building benchmarks, LLM-as-a-judge, or defect/quality analysis.
- Familiarity with modern training/inference infrastructure (e.g., distributed training, RL frameworks, model serving).