Data Scientist I

Elsevier

Amsterdam, Netherlands

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

€ 5.6K

Job location

Amsterdam, Netherlands

Tech stack

Java

Artificial Intelligence

Amazon Web Services (AWS)

Artificial Neural Networks

Unit Testing

Azure

Big Data

Cluster Analysis

Computer Programming

Continuous Integration

Relational Databases

DevOps

Hadoop

JSON

Python

Logistic Regression

Machine Learning

Natural Language Processing

NumPy

Object-Oriented Software Development

E2e Testing

TensorFlow

SciPy

Software Engineering

SQL Databases

Support Vector Machine

UML

XML

Data Processing

Multithreading

Cloud Platform System

PyTorch

Large Language Models

Spark

Deep Learning

Parallel Computation

GIT

Pandas

Matplotlib

Scikit Learn

Kubernetes

Information Technology

Production Code

Data Analytics

Performance Monitor

Machine Learning Operations

K Means

Databricks

Programming Languages

Microservices

Job description

As a Data Scientist II, you'll play a vital role in our organization, spearheading the development, testing, and maintenance of our NLP solutions. You'll be immersed in the entire lifecycle of data science projects, from inception to implementation, productionization, and ongoing refinement. Your primary focus will be delivering efficient and production-ready Python code while collaborating closely with the technology team to deploy and scale our data science pipelines.

Responsibilities

Data Insights and Model Development: Drive data collection, analysis, and model development, with a strong emphasis on classification and deep learning techniques. Define quality metrics and assess model performance, regularly presenting insights to stakeholders.
Production-Ready Solutions: Craft production-ready Python packages for all components of data science pipelines, including preprocessing and model inference. Collaborate with the technology team for seamless deployment.
End-to-End Integration and Quality Assurance: Integrate data science components and conduct thorough quality assessments, leveraging your knowledge of large language models. Ensure the resilience of our data science pipelines against model drift and develop maintenance tools and strategies, including automated model re-training.
Performance Reporting and Strategy Development: Establish a reporting process for pipeline performance and implement automatic re-training strategies for existing pipelines.

Requirements

Education and Experience: Minimum of 2 years of relevant applied experience and a Master's degree in computer science, data science, artificial intelligence, mathematics, statistics, or related quantitative fields. Alternatively, at least 3 years of relevant experience. International working or education experience is a valuable asset.
Programming Proficiency: Strong hands-on Python skills, with the ability to write unit tests and production-ready code following best practices and object-oriented principles.
Machine Learning Expertise: Hands-on experience in classification, regression, clustering, and deep learning techniques. Familiarity with neural networks, large language models, random forests, logistic regression, SVM, K-Means, etc. Proficiency in Scikit-learn, PyTorch, and/or Tensorflow.
Knowledge of Large Language Models: Proficiency in utilizing and integrating large language models for natural language processing tasks.
Data Manipulation: Proficiency in data processing, cleaning, and analysis, using tools like Pandas, NumPy, Matplotlib, and SciPy.
Communication Skills: Excellent communication and presentation skills, particularly in conveying data science concepts to non-technical stakeholders.
Analytical Thinking: Strong analytical thinking and problem-solving skills. Ability to translate complex requirements into practical solutions.
Technical Competence: Proficiency in Git, basic DevOps, and CI/CD skills. Familiarity with cloud computing platforms such as AWS and Azure.
Continuous Learning: Willingness to learn and an interest in gaining experience in MLOps and data science productionization.

Nice to Have

Experience in later stages of the data science lifecycle, including optimization of productionization using techniques like parallelization and multi-threading, as well as automated model re-training.
Familiarity with MLOps frameworks (e.g., SageMaker, Kubeflow, MLFlow) and big data processing frameworks (e.g., Spark, Hadoop, Databricks).
Software engineering skills, including proficiency in additional programming languages like Java and SQL, as well as knowledge of relational databases, semi-structured and unstructured document formats (e.g., JSON and XML), REST interfaces, micro-services, and UML.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all