Data Scientist II

Scribd Inc.

Dallas, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 123K

Job location

Remote

Dallas, United States of America

Tech stack

Artificial Intelligence

Artificial Neural Networks

Distributed Computing Environment

Information Retrieval

Interaction Design

Python

Machine Learning

Language Modeling

Natural Language Processing

Named Entity Recognition

NumPy

SQL Databases

PyTorch

Large Language Models

Spark

Deep Learning

Generative AI

PySpark

Scikit Learn

Information Technology

Machine Learning Operations

Databricks

Job description

We believe the best work happens when individual flexibility is balanced with meaningful community connection. Scribd Flex empowers employees to choose the workstyle and location that support their best performance, while committing to intentional in-person moments that strengthen collaboration and culture. Occasional in-person attendance is required for all Scribd, Inc. employees, regardless of location.

So what are we looking for in new team members? At Scribd, Inc., we hire for "GRIT." Traditionally defined as the intersection of passion and perseverance toward long-term goals, GRIT reflects the mindset we expect from every employee. For us, it also serves as a practical framework for how we work: setting and achieving Goals, delivering Results within your role, contributing Innovative ideas and solutions, and strengthening the broader Team through collaboration and attitude.

This posting reflects an approved, open position within the organization.

About the team

The Applied Research team is a group of data scientists and content specialists who are experts in leveraging machine learning, natural language processing and generative AI models to develop solutions which deliver value to our users and business.

We act as a key driver for innovation, whether it's in product surface experimentation, metadata generation or model development. Along with Product and Engineering partners, we design solutions and collaborate in cross-functional squads to maximize business impact.

Our areas of impact include content enrichment, representation learning, recommendations, search, translation and many others, applied to diverse media across text, image, and audio. We operate at a scale of hundreds of millions of documents, millions of users and billions of user interactions., * Focus on a variety of content classification use cases, leveraging everything from traditional NLP to sophisticated LLMs and generative models

Investigate methods of solving our most challenging problems at Scribd, at scale
Collaborate with other Data Scientists, Machine Learning Engineers and ML Data Engineers on cross-functional projects
Leverage any algorithm at your disposal: from classical Scikit-learn and NumPy models to custom Neural Networks in PyTorch to third party LLM APIs
Process massive amounts of data with Python, SQL and Spark
Align with stakeholders through written and verbal communications methods on the approaches and results of projects, while writing detailed, accurate and concise project documentation

Requirements

We are seeking a Data Scientist II with experience developing and deploying machine learning models. You will help design and implement high impact AI and ML systems. We work in cross-functional teams collaborating with Machine Learning Engineers, Data Engineers and Product. We are seeking a curious and collaborative individual with an eye for simplicity, end-end visibility and impact and that is excited about building models using massive amounts of data, using language models and deploying models., * 3+ years of post qualification experience developing machine learning models, working with systems at scale and deploying to production environments.

Proficiency in Python.
Hands-on experience building ML pipelines and working with distributed data processing frameworks like Apache Spark, Databricks, or similar.
Intermediate level in at least three of these fields: classification algorithms, natural language processing, search, information retrieval, named entity recognition, deep learning, generative models.
Intermediate level or greater experience with SQL or PySpark.
Bachelors or Masters in relevant quantitative discipline including but not limited to Statistics, Computer Science, Data Science, Artificial Intelligence or another field with a strong quantitative focus.

Benefits & conditions

Paid parental leave, Parental leave, Health insurance, Paid time off, Vision insurance, Dental insurance, Disability insurance, At Scribd, your base pay is one part of your total compensation package and is determined within a range. Our pay ranges are based on the local cost of labor benchmarks for each specific role, level, and geographic location. San Francisco is our highest geographic market in the United States. In the state of California, the reasonably expected salary range is between $118,000 [minimum salary in our lowest geographic market within California] to $184,000 [maximum salary in our highest geographic market within California].

In the United States, outside of California, the reasonably expected salary range is between $97,000 [minimum salary in our lowest US geographic market outside of California] to $175,000 [maximum salary in our highest US geographic market outside of California].

In Canada, the reasonably expected salary range is between $123,000 CAD[minimum salary in our lowest geographic market] to $164,000 CAD[maximum salary in our highest geographic market].

We carefully consider a wide range of factors when determining compensation, including but not limited to experience; job-related skill sets; relevant education or training; and other business and organizational needs. The salary range listed is for the level at which this job has been scoped. In the event that you are considered for a different level, a higher or lower pay range would apply. This position is also eligible for a competitive equity ownership, and a comprehensive and generous benefits package., * Scribd Flex (flexible work model)

Comprehensive health, dental, and vision coverage
Mental health support and disability coverage
Generous paid time off, including vacation, sick time, holidays, winter break, volunteer time, and sabbaticals
Paid parental leave and family support benefits
Retirement matching and employee equity
Learning and development programs and professional growth opportunities
Wellness and home office stipends
Complimentary access to the Scribd, Inc. suite of products
Enterprise access to leading AI tools

About the company

Scribd, Inc. is on a mission to advance human understanding. Our four products - Scribd, Slideshare, Everand, and Fable - help billions of people across the globe move beyond access and into insight, application, and expertise.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all