Data Engineer - LLM Workflows

Apple Firmenprofil
20 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Tech stack

JavaScript
API
Amazon Web Services (AWS)
Data analysis
Software Applications
Cloud Computing
Information Engineering
Data Integrity
Data Visualization
Web Development
Python
Machine Learning
TypeScript
Data Processing
PyTorch
Large Language Models
Build Management
Scikit Learn
Information Technology
Data Pipelines
Microservices

Job description

Imagine what you could do here. At Apple, new ideas have a way of becoming great products very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The Apps Engineering team, empowering millions of writers, musicians, filmmakers, photographers, designers and creators worldwide, is seeking a data engineer to fuel our LLM development workflows by working on data curation and building tools and infrastructure to maintain and evaluate datasets. You'll work closely with machine learning engineers, developing deep understanding of data quality, characteristics, and curation needs. You'll also create web-based tools for data collection, evaluation, and visualization systems that enable our team to develop and deploy LLM workflows across diverse environments while maintaining Apple's high standards for quality, performance, and creative ethics., Make a difference. As a Data Engineer specializing in LLM Workflows on the Apps Engineering team, you'll work directly with large-scale datasets by exploring their characteristics, evaluating their quality, and maintaining them throughout the ML development lifecycle. You'll dive deep into data for LLM workflows to understand what makes datasets effective for model training and fine-tuning adapters. Beyond dataset work, you'll design and build web-based tools that support data collection, model evaluation, and result visualization through intuitive interfaces. You'll develop APIs and data pipelines that connect data collection tools with ML training infrastructure, creating seamless workflows for model development and fine-tuning. This role requires strong Python skills for both data analysis and web development, combined with curiosity about machine learning fundamentals and LLM workflows., * Explore and analyze large-scale datasets to understand their characteristics, quality, and suitability for ML training workflows.

  • Evaluate dataset quality, identify biases or gaps, and develop strategies for dataset curation and improvement.
  • Maintain and manage datasets throughout their lifecycle, ensuring data integrity, accessibility, and proper documentation.
  • Design and build web-based tools for ML data collection, annotation, and curation supporting LLM development workflows.
  • Develop APIs and backend services using Python frameworks to support data collection and evaluation workflows.
  • Create data visualization dashboards and interfaces to analyze dataset characteristics, model performance, and training metrics.
  • Build and maintain data pipelines to process and prepare datasets for model training and fine-tuning.Collaborate with ML researchers, app engineers, and framework teams across Apple to understand data requirements and evaluation needs.

Requirements

Do you have experience in Web development?, Do you have a Master's degree?, * Experience with data processing and evaluation of LLM workflows.

  • Experience building data collection and evaluation for LLM-supported systems.
  • Experience with developing web tools using JavaScript or TypeScript.
  • Experience deploying web tools or applications on cloud platforms (AWS or GCP)., * Experience with machine learning fundamentals and frameworks (scikit-learn, PyTorch).
  • BS/MS in Computer Science, Data Engineering, or related technical field, or 3 years of equivalent work experience.
  • Experience exploring and analyzing large-scale datasets for ML applications.
  • Experience with dataset curation, quality assessment, and bias detection methodologies.
  • Experience building data collection tools or working with crowd annotation platforms.
  • Experience designing data augmentation pipelines.
  • Experience with data visualization libraries and creating dashboards for dataset analysis and ML metrics.
  • Experience with model development or LLM fine-tuning pipelines.Experience working with creators, audio/video production, or creative software applications.

Apply for this position