Data Scientist

Capgemini

Medina, United States of America

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Medina, United States of America

Tech stack

Java

Artificial Intelligence

Amazon Web Services (AWS)

Azure

Big Data

Data Architecture

Information Engineering

ETL

Data Manipulation Languages

Data Stores

Data Warehousing

Hadoop

Python

Machine Learning

Systems Integration

Unstructured Data

Data Storage Technologies

Large Language Models

Snowflake

Prompt Engineering

Spark

Generative AI

Indexer

Data Lake

PySpark

Information Technology

Machine Learning Operations

Data Pipelines

Api Management

Serverless Computing

Redshift

Databricks

Job description

As a Data Scientist , you will lead the development and implementation of advanced data engineering solutions to support the deployment and optimization of Generative AI models. Your role will involve leveraging your extensive experience to design robust, scalable, and innovative data architectures that align with the unique requirements of General Artificial Intelligence (GenAI) applications., * The Machine Learning Engineer will be responsible for architectural design and planning, advanced data pipelines, model integration and optimization, scalability, performance and research and innovation supporting production generative AI systems.

Production level ML workloads for customers using Databricks platform, including end-to-end ML pipelines, training/inference optimization, integration with cloud-native services and MLOps
Build and maintain data engineering solutions on cloud platforms using hyperscaler services.
Develop production-grade cloud (AWS/Azure/GCP) infrastructure that supports the deployment of ML applications, including drift monitoring
Design, develop, and maintain data pipelines to efficiently collect, process, and load data from various sources into data storage systems (e.g., data warehouses, data lakes).
Understanding indexing and vectorization to use with Generative AI prompt engineering.
Strong understanding of fundamental data science concepts in NLP, including selection and understanding of embedding models.
Use hyperscaler technologies to support data needs for expansion of Machine Learning/Data Science capabilities including generative AI.
Design, develop, and implement scalable data pipelines and ETL/ELT processes using Python, PySpark and API integrations.

Requirements

Bachelor's degree in computer science, data engineering, or a related field with 3+ year's experience (Master's preferred).
Proven experience in data engineering, MLOps, ETL, and database management, QL and data manipulation languages.
Azure, Python, Java, or Scala.
data warehousing platforms (e.g., Databricks, Amazon Redshift, Snowflake) and big data technologies (e.g., Hadoop, Spark).
highly scalable Data stores, Data Lake, Data Warehouse, Lakehouse, and unstructured datasets

About the company

Capgemini ist einer der weltweit führenden Anbieter von Management- und IT-Beratung, Technologie-Services und Digitaler Transformation. Als ein Wegbereiter für Innovation unterstützt das Unternehmen seine Kunden bei deren komplexen Herausforderungen rund um Cloud, Digital und Plattformen.