AI Engineer/ML Engineer - Senior Developers - AI Training - San Francisco, US
Role details
Job location
Tech stack
Job description
We're looking for AI and Machine Learning Engineers to join our Expert Network to help train and evaluate the next generation of LLMs using deep technical expertise. If you have the necessary experience, we'll send you a quick 10- to 15-minute test to assess your skills and suitability for AI tasks. If successful, you'll be invited to join Prolific as a participant, where you'll get paid to train and evaluate powerful AI models., * Evaluate LLM Architecture Logic: review AI-generated explanations of model architectures, loss functions, and backpropagation for technical accuracy.
- Audit Code & Notebooks: validate ML-specific code (e.g., training loops, data preprocessing scripts, or model evaluations) for efficiency and correctness.
- Refine RLHF Frameworks: provide the high-quality human feedback necessary to align models with human intent, safety, and helpfulness.
- Analyze Model Reasoning: critically assess how an AI model navigates complex chain-of-thought (CoT) prompts and identify where the reasoning breaks down.
- Benchmark Performance: conduct comparative testing between different model outputs based on specific technical taxonomies and performance metrics.
Requirements
- Education: a BS, MS, or PhD in Computer Science, Artificial Intelligence, Robotics, or a related quantitative field with a focus on Machine Learning.
- Professional Experience: experience building, deploying, or fine-tuning ML models in a production environment.
- Deep Learning Mastery: professional-level understanding of neural network architectures (Transformers, CNNs, RNNs) and optimization techniques.
- LLM Specialization: hands-on experience with Prompt Engineering, RLHF (Reinforcement Learning from Human Feedback), or RAG (Retrieval-Augmented Generation) workflows.
- Technical Rigor: the ability to audit complex model logic, identify training data contamination, and evaluate mathematical proofs behind ML algorithms.
- Analytical Critique: high attention to detail in spotting "hallucinations," biased outputs, or logical failures in AI-generated technical content., * Frameworks: expert proficiency in PyTorch or TensorFlow/Keras
- Language & Data: advanced Python (NumPy, Pandas, Scikit-learn) and experience with Hugging Face Transformers.
- Cloud & MLOps: experience with AWS (SageMaker), Google Cloud (Vertex AI), or specialized tools like Weights & Biases and LangChain.
- Vector Databases: familiarity with Pinecone, Milvus, or Weaviate for RAG evaluation.
Benefits & conditions
Researchers looking for your skills tend to pay up to $80 per hour. You must be prepared to complete paid tasks that require one hour of uninterrupted work, though many are shorter.