Data Scientist I
Role details
Job location
Tech stack
Job description
As a Data Scientist II, you'll play a vital role in our organization, spearheading the development, testing, and maintenance of our NLP solutions. You'll be immersed in the entire lifecycle of data science projects, from inception to implementation, productionization, and ongoing refinement. Your primary focus will be delivering efficient and production-ready Python code while collaborating closely with the technology team to deploy and scale our data science pipelines.
Responsibilities
-
Data Insights and Model Development: Drive data collection, analysis, and model development, with a strong emphasis on classification and deep learning techniques. Define quality metrics and assess model performance, regularly presenting insights to stakeholders.
-
Production-Ready Solutions: Craft production-ready Python packages for all components of data science pipelines, including preprocessing and model inference. Collaborate with the technology team for seamless deployment.
-
End-to-End Integration and Quality Assurance: Integrate data science components and conduct thorough quality assessments, leveraging your knowledge of large language models. Ensure the resilience of our data science pipelines against model drift and develop maintenance tools and strategies, including automated model re-training.
-
Performance Reporting and Strategy Development: Establish a reporting process for pipeline performance and implement automatic re-training strategies for existing pipelines.
Requirements
-
Education and Experience: Minimum of 2 years of relevant applied experience and a Master's degree in computer science, data science, artificial intelligence, mathematics, statistics, or related quantitative fields. Alternatively, at least 3 years of relevant experience. International working or education experience is a valuable asset.
-
Programming Proficiency: Strong hands-on Python skills, with the ability to write unit tests and production-ready code following best practices and object-oriented principles.
-
Machine Learning Expertise: Hands-on experience in classification, regression, clustering, and deep learning techniques. Familiarity with neural networks, large language models, random forests, logistic regression, SVM, K-Means, etc. Proficiency in Scikit-learn, PyTorch, and/or Tensorflow.
-
Knowledge of Large Language Models: Proficiency in utilizing and integrating large language models for natural language processing tasks.
-
Data Manipulation: Proficiency in data processing, cleaning, and analysis, using tools like Pandas, NumPy, Matplotlib, and SciPy.
-
Communication Skills: Excellent communication and presentation skills, particularly in conveying data science concepts to non-technical stakeholders.
-
Analytical Thinking: Strong analytical thinking and problem-solving skills. Ability to translate complex requirements into practical solutions.
-
Technical Competence: Proficiency in Git, basic DevOps, and CI/CD skills. Familiarity with cloud computing platforms such as AWS and Azure.
-
Continuous Learning: Willingness to learn and an interest in gaining experience in MLOps and data science productionization.
Nice to Have
-
Experience in later stages of the data science lifecycle, including optimization of productionization using techniques like parallelization and multi-threading, as well as automated model re-training.
-
Familiarity with MLOps frameworks (e.g., SageMaker, Kubeflow, MLFlow) and big data processing frameworks (e.g., Spark, Hadoop, Databricks).
-
Software engineering skills, including proficiency in additional programming languages like Java and SQL, as well as knowledge of relational databases, semi-structured and unstructured document formats (e.g., JSON and XML), REST interfaces, micro-services, and UML.