Data Scientist/Engineer

Cayuse, LLC

1 month ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

$ 62K

Job location

Remote

Tech stack

Data analysis

Azure

Big Data

Data Cleansing

Information Engineering

Data Transformation

Python

Machine Learning

Azure

Unstructured Data

Data Processing

Data Storage Technologies

Data Ingestion

Azure

Jupyter

Azure

Software Version Control

Data Pipelines

Databricks

Programming Languages

Job description

We are seeking a highly skilled and motivated Data Scientist/Engineer to join our dynamic and innovative team. The ideal candidate will have hands-on experience designing, building, and maintaining scalable data processing pipelines, implementing machine learning solutions, and ensuring data quality across the organization. This role requires a strong technical foundation in Azure cloud platforms, data engineering, and applied data science to support critical business decisions and technological advancements., Data Engineering

Build and Maintain Data Pipelines: Develop and manage scalable data pipelines using Azure Data Factory, Azure Synapse Analytics, or Azure Databricks to process large volumes of data.
Data Quality and Transformation: Ensure the transformation, cleansing, and ingestion of data from a wide range of structured and unstructured sources with appropriate error handling.
Optimize Data Storage: Utilize and optimize data storage solutions, such as Azure Data Lake and Blob Storage, to ensure cost-effective and efficient data storage practices.

Machine Learning Support

Collaboration with ML Engineers and Architects: Work with Machine Learning Engineers and Solution Architects to seamlessly deploy machine learning models into production environments.
Automated Retraining Pipelines: Build automated systems to monitor model performance, detect model drift, and trigger retraining processes as needed.
Experiment Reproducibility: Ensure reproducibility of ML experiments by maintaining proper version control for models, data, and code.

Data Analysis and Preprocessing

Data Ingestion and Exploration: Ingest, explore, and preprocess both structured and unstructured data with tools such as:

Azure Data Lake Storage
Azure Synapse Analytics
Azure Data Factory

Exploratory Data Analysis (EDA): Perform exploratory data analysis using notebooks like Azure Machine Learning Notebooks or Azure Databricks to derive actionable insights.
Data Quality Assessments: Identify data anomalies, evaluate data quality, and recommend appropriate data cleansing or remediation strategies.

General Responsibilities *

Pipeline Monitoring and Optimization: Continuously monitor the performance of data pipelines and workloads, identifying opportunities for optimization and improvement.
Collaboration and Communication: Communicate findings and technical requirements effectively with cross-functional teams, including data scientists, software engineers, and business stakeholders.
Documentation: Document all data workflows, experiments, and model implementations to facilitate knowledge sharing and maintain continuity of operations.

Requirements

Proven experience in building and managing data pipelines using Azure Data Factory, Azure Synapse Analytics, or Databricks.
Strong knowledge of Azure storage solutions, including Azure Data Lake and Blob Storage.
Familiarity with data transformation, ingestion techniques, and data quality methodologies.
Proficiency in programming languages such as Python or Scala for data processing and ML integration.
Experience in exploratory data analysis and working with notebooks like Jupyter, Azure Machine Learning Notebooks, or Azure Databricks.
Solid understanding of machine learning lifecycle management and model deployment in production environments.
Strong problem-solving skills with experience detecting and addressing data anomalies.