Data Scientist/Engineer
Role details
Job location
Tech stack
Job description
We are seeking a highly skilled and motivated Data Scientist/Engineer to join our dynamic and innovative team. The ideal candidate will have hands-on experience designing, building, and maintaining scalable data processing pipelines, implementing machine learning solutions, and ensuring data quality across the organization. This role requires a strong technical foundation in Azure cloud platforms, data engineering, and applied data science to support critical business decisions and technological advancements., Data Engineering
- Build and Maintain Data Pipelines: Develop and manage scalable data pipelines using Azure Data Factory, Azure Synapse Analytics, or Azure Databricks to process large volumes of data.
- Data Quality and Transformation: Ensure the transformation, cleansing, and ingestion of data from a wide range of structured and unstructured sources with appropriate error handling.
- Optimize Data Storage: Utilize and optimize data storage solutions, such as Azure Data Lake and Blob Storage, to ensure cost-effective and efficient data storage practices.
Machine Learning Support
- Collaboration with ML Engineers and Architects: Work with Machine Learning Engineers and Solution Architects to seamlessly deploy machine learning models into production environments.
- Automated Retraining Pipelines: Build automated systems to monitor model performance, detect model drift, and trigger retraining processes as needed.
- Experiment Reproducibility: Ensure reproducibility of ML experiments by maintaining proper version control for models, data, and code.
Data Analysis and Preprocessing
- Data Ingestion and Exploration: Ingest, explore, and preprocess both structured and unstructured data with tools such as:
- Azure Data Lake Storage
- Azure Synapse Analytics
- Azure Data Factory
- Exploratory Data Analysis (EDA): Perform exploratory data analysis using notebooks like Azure Machine Learning Notebooks or Azure Databricks to derive actionable insights.
- Data Quality Assessments: Identify data anomalies, evaluate data quality, and recommend appropriate data cleansing or remediation strategies.
General Responsibilities *
- Pipeline Monitoring and Optimization: Continuously monitor the performance of data pipelines and workloads, identifying opportunities for optimization and improvement.
- Collaboration and Communication: Communicate findings and technical requirements effectively with cross-functional teams, including data scientists, software engineers, and business stakeholders.
- Documentation: Document all data workflows, experiments, and model implementations to facilitate knowledge sharing and maintain continuity of operations.
Requirements
- Proven experience in building and managing data pipelines using Azure Data Factory, Azure Synapse Analytics, or Databricks.
- Strong knowledge of Azure storage solutions, including Azure Data Lake and Blob Storage.
- Familiarity with data transformation, ingestion techniques, and data quality methodologies.
- Proficiency in programming languages such as Python or Scala for data processing and ML integration.
- Experience in exploratory data analysis and working with notebooks like Jupyter, Azure Machine Learning Notebooks, or Azure Databricks.
- Solid understanding of machine learning lifecycle management and model deployment in production environments.
- Strong problem-solving skills with experience detecting and addressing data anomalies.