Data Scientist, AI/ML Model Quality

Apple Inc.
San Diego, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

San Diego, United States of America

Tech stack

Artificial Intelligence
Data analysis
Apache HTTP Server
Big Data
Information Engineering
Data Visualization
Distributed Computing Environment
Statistical Hypothesis Testing
Python
Machine Learning
NumPy
Statistical Process Control (SPC)
SQL Databases
Tableau
Large Language Models
Spark
Model Validation
Generative AI
Pandas
PySpark
Scikit Learn
Information Technology
Machine Learning Operations
Databricks
Data Generation

Job description

Would you like to contribute to Machine Learning and Generative AI technologies? Are you passionate about the integrity of the data that powers AI systems at scale? Do you believe that trustworthy data is the foundation of every great model? We truly believe it is!

We are defining what exceptional data quality looks like for machine learning across Wallet, Payments, and Commerce. As a Data Scientist, AI/ML Model Quality, you will build and maintain intelligent systems, validation frameworks, and monitoring pipelines that keep our data ecosystem healthy - ensuring that every model we build is trained, evaluated, and deployed on data we can trust. Your work sits at the foundation of every ML feature that reaches hundreds of millions of users.

You'll work at the intersection of statistical rigor and production systems, collaborating closely with ML Engineering, Data Engineering, Privacy, and Legal teams. This unique opportunity puts you at the center of ML and AI quality - owning the health of training and validation datasets, defining and analyzing observability metrics to surface actionable product insights, and leading telemetry analysis across GenAI workflows - ensuring Apple's financial features are built on the highest-quality data, whether powering conventional ML models or the latest generative AI systems., The ideal candidate is a detail-obsessed data scientist who understands that model quality starts long before training - it starts with the data. You have strong statistical instincts, know how silent degradation and data drift manifest in production systems, and can translate raw quality signals into insights that drive real decisions.

You will own the health of the data ecosystem that underpins ML and GenAI features across Wallet, Payments, and Commerce - building validation frameworks, defining observability metrics, and leading telemetry analysis that keeps every model trained, evaluated, and monitored on data teams can trust. Your work sits at the foundation of every ML feature that reaches hundreds of millions of users.

Requirements

  • A Bachelor's degree with exceptional hands-on experience in ML/AI model quality or applied research or a M.S or Ph.D in Machine Learning, Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field is strongly preferred.
  • 3+ years of experience in data science or a closely related analytical role, with a strong focus on data quality, model evaluation, or ML observability in production environments.
  • Proficiency in Python (Pandas, NumPy, Scikit-learn) and SQL for complex data analysis, metric creation, and validation.
  • Experience querying and analyzing large-scale datasets using distributed computing frameworks (e.g., PySpark, Spark, or distributed SQL).
  • Solid understanding of statistical methods - hypothesis testing, distribution analysis, data drift detection, and statistical process control.
  • Experience in defining and tracking ML model health metrics in production - model performance monitoring, feature drift detection, and observability instrumentation.
  • Familiarity with GenAI or LLM systems, including common quality failure modes, output evaluation approaches, and telemetry instrumentation.
  • Strong communication skills - ability to translate complex data quality findings and model health risks into clear, actionable insights for both engineering and non-technical stakeholde, * Experience with data visualization and dashboarding tools (e.g., Tableau, Apache Superset, Databricks) to present complex ML telemetry.
  • Familiarity with LLM evaluation frameworks (e.g. LangSmith) or techniques like LLM-as-a-judge.
  • Experience with Bayesian or causal graph-based approaches to synthetic data generation.
  • Familiarity with confidence calibration techniques and uncertainty quantification.
  • Experience with ML monitoring or observability platforms (e.g., MLflow, Weights & Biases, or equivalent).
  • Experience working with privacy-constrained data or under regulatory compliance frameworks (GDPR, DMA).
  • Background in financial services, fintech, or consumer payment products.

Apply for this position