Data Scientist

W. H. GREEN & SONS, INC.

Portland, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Portland, United States of America

Tech stack

Clean Code Principles

Amazon Web Services (AWS)

Data analysis

Big Data

Google BigQuery

Cloud Storage

Data Visualization

Relational Databases

DevOps

Programming Tools

Distributed Computing Environment

Distributed Data Store

Statistical Hypothesis Testing

Python

Machine Learning

MongoDB

Regression Analysis

NLTK

NumPy

OpenCV

TensorFlow

Standard Sql

Software Engineering

SQL Databases

Jupyter Notebook

Google Cloud Platform

Cloud Platform System

Feature Engineering

PyTorch

GIT

Pandas

Matplotlib

Scikit Learn

Information Technology

HuggingFace

Plotly

Bitbucket

Gensim

Spacy

Software Version Control

Job description

Analyze large-scale structured and semi-structured datasets to uncover patterns, trends, and insights that support business and product decisions.
Develop, train, and evaluate machine learning models for use cases such as prediction, classification, anomaly detection, and forecasting.
Perform exploratory data analysis (EDA) to understand data distributions, detect anomalies, and guide feature engineering strategies.
Apply statistical techniques including hypothesis testing, regression analysis, and probability modeling to validate results and support decision-making.
Design and implement feature engineering pipelines to transform raw data into meaningful inputs for machine learning models.
Build and compare multiple models using appropriate evaluation metrics (accuracy, precision, recall, F1-score, ROC-AUC) and optimize performance through tuning.
Work with large datasets using distributed computing frameworks or cloud-based platforms to ensure scalability and efficiency.
Develop data visualizations, dashboards, and reports to effectively communicate analytical findings to technical and non-technical stakeholders.
Collaborate with cross-functional teams including product managers, engineers, and business teams to translate business problems into data-driven solutions.
Support the deployment of machine learning models into production by working with engineering teams and ensuring models meet performance and reliability standards.
Monitor model performance over time and assist in updating models based on new data and changing business requirements.
Write clean, modular, and maintainable code following best practices in software development and version control.
Document analytical workflows, model assumptions, and results to ensure reproducibility and knowledge sharing across teams.

Technologies / Environment involved:

Distributed storage: AWS Cloud Storage (S3), Google Cloud (GCP - Cloud Storage, BigQuery)
Database management: MongoDB, SQL (Relational Databases)
Machine learning: TensorFlow, PyTorch, Scikit-learn, NumPy, Pandas; exposure to SpaCy, NLTK, HuggingFace, Gensim, OpenCV
Programming Languages:Python, SQL
Data Visualization:Matplotlib, Seaborn, Plotly
Development Tools: Jupyter Notebook / JupyterLab
DevOps Tools:Git, Bitbucket

Requirements

Do you have experience in spaCy?, Do you have a Bachelor's degree?, Data Scientist with Bachelor's Degree in Computer Science, Computer Information Systems, Information Technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor's degree in one of the aforementioned subjects.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all