Data Scientist

Genentech

South San Francisco, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 236K

Job location

South San Francisco, United States of America

Tech stack

API

Artificial Intelligence

Amazon Web Services (AWS)

Azure

Encodings

Computer Programming

Data Cleansing

Data Infrastructure

Python

Linear Regression

Machine Learning

Natural Language Processing

Named Entity Recognition

Raw Data

TensorFlow

Sentiment Analysis

Software Engineering

SQL Databases

Statistics

Tokenization

Unstructured Data

Management of Software Versions

Reinforcement Learning

Data Processing

Feature Engineering

PyTorch

Large Language Models

Deep Learning

Topic Modeling

Convolutional Neural Networks

Pandas

Scikit Learn

Information Technology

HuggingFace

Machine Learning Operations

Categorical Data

GPT

Recurrent Neural Networks

Unsupervised Learning

Requirements

As a Data Scientist you will have a strong foundation in machine learning (ML), data science, and software engineering. You will have practical experience in building and deploying ML models and developing AI agents, particularly for tasks involving unstructured/structured data and workflow automation., * Machine Learning and Deep Learning: The candidate must be proficient in a wide range of ML algorithms, from traditional models like linear regression and decision trees to more advanced deep learning architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). They should understand the principles behind model training, validation, and hyperparameter tuning.

Natural Language Processing (NLP): For extracting information from unstructured text, strong NLP skills are essential. Look for experience with techniques like tokenization, sentiment analysis, named entity recognition, topic modeling, and using pre-trained language models like BERT, GPT, or others from the Hugging Face ecosystem.
Data Handling and Feature Engineering: They should be adept at working with various data formats and have experience in data cleaning, preprocessing, and transforming raw data into useful features for ML models. This includes handling missing values, encoding categorical data, and scaling numerical features.
Programming and MLOps: Proficiency in Python is a must, along with a solid understanding of key libraries like Scikit-learn, Pandas, TensorFlow, and PyTorch. Experience with MLOps (Machine Learning Operations) practices, including model versioning, monitoring, and deployment on cloud platforms (AWS, Azure, or GCP), is crucial for building and maintaining robust solutions.
AI Agent Architectures: Look for a candidate who understands the components of an AI agent, including a Large Language Model (LLM) as the brain, tools for specific tasks, and a logical structure for decision-making.
Workflow Automation: The candidate should have practical experience in designing and implementing automated workflows. This involves integrating AI agents and ML models into existing business processes. They should be able to identify bottlenecks, map out a solution, and build the necessary connectors or APIs to execute tasks automatically.
Unstructured Data: The candidate needs to demonstrate expertise in handling various forms of unstructured data, including text, images, and audio. This involves building pipelines to ingest, process, and analyze this data to extract meaningful insights or trigger actions.

Who you are

Problem-Solving: The ability to break down complex business problems into manageable, data-driven solutions is key. They should be able to think critically and creatively to solve real-world challenges.
Communication: A great candidate can clearly articulate technical concepts to non-technical stakeholders, explaining the "why" and "how" of their solutions. This is vital for collaborating with different teams and ensuring the project meets business goals.
Business Acumen: The best candidates understand the business context of their work. They should be able to connect their technical solutions directly to a positive impact on the company's bottom line or operational efficiency., * Minimum Requirement: A Bachelor's degree in a highly quantitative field (Computer Science, Data Science or related field).
Preferred: A Master's in a specialized domain such as Machine Learning, Computational Statistics, Operations Research, or a related quantitative discipline.
Proven Track Record: At least 7 years of professional experience in data science, with a clear history of taking AI applications from conceptualization to production environments.
Data Handling: Expertise in handling unstructured data
Advanced ML Expertise: Experience with supervised/unsupervised learning, deep learning (CNNs, Transformers), and reinforcement learning; proficiency in building agentic workflows, including RAG integration and LLM orchestration
Data Infrastructure: Expertise in SQL and experience working with cloud platforms (AWS, GCP, or Azure)
Large Language Model expertise required
Experience with Diagnostics and/or Pharmaceutical data is a plus