Data Scientist I

Elsevier
Amsterdam, Netherlands
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
€ 5.6K

Job location

Amsterdam, Netherlands

Tech stack

Java
Artificial Intelligence
Amazon Web Services (AWS)
Artificial Neural Networks
Unit Testing
Azure
Big Data
Cluster Analysis
Computer Programming
Continuous Integration
Relational Databases
DevOps
Hadoop
JSON
Python
Logistic Regression
Machine Learning
Natural Language Processing
NumPy
Object-Oriented Software Development
E2e Testing
TensorFlow
SciPy
Software Engineering
SQL Databases
Support Vector Machine
UML
XML
Data Processing
Multithreading
Cloud Platform System
PyTorch
Large Language Models
Spark
Deep Learning
Parallel Computation
GIT
Pandas
Matplotlib
Scikit Learn
Kubernetes
Information Technology
Production Code
Data Analytics
Performance Monitor
Machine Learning Operations
K Means
Databricks
Programming Languages
Microservices

Job description

As a Data Scientist II, you'll play a vital role in our organization, spearheading the development, testing, and maintenance of our NLP solutions. You'll be immersed in the entire lifecycle of data science projects, from inception to implementation, productionization, and ongoing refinement. Your primary focus will be delivering efficient and production-ready Python code while collaborating closely with the technology team to deploy and scale our data science pipelines.

Responsibilities

  • Data Insights and Model Development: Drive data collection, analysis, and model development, with a strong emphasis on classification and deep learning techniques. Define quality metrics and assess model performance, regularly presenting insights to stakeholders.

  • Production-Ready Solutions: Craft production-ready Python packages for all components of data science pipelines, including preprocessing and model inference. Collaborate with the technology team for seamless deployment.

  • End-to-End Integration and Quality Assurance: Integrate data science components and conduct thorough quality assessments, leveraging your knowledge of large language models. Ensure the resilience of our data science pipelines against model drift and develop maintenance tools and strategies, including automated model re-training.

  • Performance Reporting and Strategy Development: Establish a reporting process for pipeline performance and implement automatic re-training strategies for existing pipelines.

Requirements

  • Education and Experience: Minimum of 2 years of relevant applied experience and a Master's degree in computer science, data science, artificial intelligence, mathematics, statistics, or related quantitative fields. Alternatively, at least 3 years of relevant experience. International working or education experience is a valuable asset.

  • Programming Proficiency: Strong hands-on Python skills, with the ability to write unit tests and production-ready code following best practices and object-oriented principles.

  • Machine Learning Expertise: Hands-on experience in classification, regression, clustering, and deep learning techniques. Familiarity with neural networks, large language models, random forests, logistic regression, SVM, K-Means, etc. Proficiency in Scikit-learn, PyTorch, and/or Tensorflow.

  • Knowledge of Large Language Models: Proficiency in utilizing and integrating large language models for natural language processing tasks.

  • Data Manipulation: Proficiency in data processing, cleaning, and analysis, using tools like Pandas, NumPy, Matplotlib, and SciPy.

  • Communication Skills: Excellent communication and presentation skills, particularly in conveying data science concepts to non-technical stakeholders.

  • Analytical Thinking: Strong analytical thinking and problem-solving skills. Ability to translate complex requirements into practical solutions.

  • Technical Competence: Proficiency in Git, basic DevOps, and CI/CD skills. Familiarity with cloud computing platforms such as AWS and Azure.

  • Continuous Learning: Willingness to learn and an interest in gaining experience in MLOps and data science productionization.

Nice to Have

  • Experience in later stages of the data science lifecycle, including optimization of productionization using techniques like parallelization and multi-threading, as well as automated model re-training.

  • Familiarity with MLOps frameworks (e.g., SageMaker, Kubeflow, MLFlow) and big data processing frameworks (e.g., Spark, Hadoop, Databricks).

  • Software engineering skills, including proficiency in additional programming languages like Java and SQL, as well as knowledge of relational databases, semi-structured and unstructured document formats (e.g., JSON and XML), REST interfaces, micro-services, and UML.

Apply for this position