Data Scientist

W. H. GREEN & SONS, INC.
Portland, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Portland, United States of America

Tech stack

Clean Code Principles
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Big Data
Google BigQuery
Cloud Storage
Data Visualization
Relational Databases
DevOps
Programming Tools
Distributed Computing Environment
Distributed Data Store
Statistical Hypothesis Testing
Python
Machine Learning
MongoDB
Regression Analysis
NLTK
NumPy
OpenCV
TensorFlow
Standard Sql
Software Engineering
SQL Databases
Jupyter Notebook
Google Cloud Platform
Cloud Platform System
Feature Engineering
PyTorch
GIT
Pandas
Matplotlib
Scikit Learn
Information Technology
HuggingFace
Plotly
Bitbucket
Gensim
Spacy
Software Version Control

Job description

  • Analyze large-scale structured and semi-structured datasets to uncover patterns, trends, and insights that support business and product decisions.
  • Develop, train, and evaluate machine learning models for use cases such as prediction, classification, anomaly detection, and forecasting.
  • Perform exploratory data analysis (EDA) to understand data distributions, detect anomalies, and guide feature engineering strategies.
  • Apply statistical techniques including hypothesis testing, regression analysis, and probability modeling to validate results and support decision-making.
  • Design and implement feature engineering pipelines to transform raw data into meaningful inputs for machine learning models.
  • Build and compare multiple models using appropriate evaluation metrics (accuracy, precision, recall, F1-score, ROC-AUC) and optimize performance through tuning.
  • Work with large datasets using distributed computing frameworks or cloud-based platforms to ensure scalability and efficiency.
  • Develop data visualizations, dashboards, and reports to effectively communicate analytical findings to technical and non-technical stakeholders.
  • Collaborate with cross-functional teams including product managers, engineers, and business teams to translate business problems into data-driven solutions.
  • Support the deployment of machine learning models into production by working with engineering teams and ensuring models meet performance and reliability standards.
  • Monitor model performance over time and assist in updating models based on new data and changing business requirements.
  • Write clean, modular, and maintainable code following best practices in software development and version control.
  • Document analytical workflows, model assumptions, and results to ensure reproducibility and knowledge sharing across teams.

Technologies / Environment involved:

  • Distributed storage: AWS Cloud Storage (S3), Google Cloud (GCP - Cloud Storage, BigQuery)
  • Database management: MongoDB, SQL (Relational Databases)
  • Machine learning: TensorFlow, PyTorch, Scikit-learn, NumPy, Pandas; exposure to SpaCy, NLTK, HuggingFace, Gensim, OpenCV
  • Programming Languages:Python, SQL
  • Data Visualization:Matplotlib, Seaborn, Plotly
  • Development Tools: Jupyter Notebook / JupyterLab
  • DevOps Tools:Git, Bitbucket

Requirements

Do you have experience in spaCy?, Do you have a Bachelor's degree?, Data Scientist with Bachelor's Degree in Computer Science, Computer Information Systems, Information Technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor's degree in one of the aforementioned subjects.

Apply for this position