Data Scientist Lead

JPMorgan Chase & Co.

Tampa, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Tampa, United States of America

Tech stack

Training Data

Java

A/B testing

Artificial Intelligence

Amazon Web Services (AWS)

Data analysis

Computer Vision

Computer Programming

Databases

Data Governance

Data Transformation

Data Mining

Groovy

Statistical Hypothesis Testing

Python

Natural Language Processing

Named Entity Recognition

NumPy

Object Detection

OpenCV

Performance Tuning

TensorFlow

Software Deployment

SQL Databases

Supervised Learning

Enterprise Software Applications

Document Metadata

PyTorch

Transfer Learning

Large Language Models

Deep Learning

Pandas

Matplotlib

Containerization

Scikit Learn

Information Technology

HuggingFace

Codebase

Machine Learning Operations

Feature Extraction

Document Classification

Amazon Web Services (AWS)

Job description

As Data Scientist Lead within Commercial & Investment Bank with the Healthcare Provider team, you will lead a team in building advanced solutions for image classification, text categorization, and intelligent data extraction from scanned documents. You will have deep proficiency in Python, PyTorch, TensorFlow, Hugging Face Transformers, AWS SageMaker/Bedrock, and hands-on experience with CNN/transformer architectures, OCR technologies, and multimodal document understanding models. This role involves managing the full ML lifecycle, from prototyping to production deployment on AWS EKS., * Lead and mentor a team of data scientists in designing and executing advanced analytics and modeling projects focused on image classification, text categorization, and intelligent data extraction from scanned document images. Foster a culture of curiosity, analytical rigor, and continuous learning by developing team members in deep learning, computer vision, NLP, and document AI techniques.

Define and drive the analytical strategy for document understanding use cases, identifying the optimal combination of computer vision, NLP, and multimodal approaches.
Build and fine-tune multimodal document understanding and text categorization models. Leverage the interplay of textual content, spatial layout, and visual features to extract structured fields and key-value pairs from complex scanned documents, while enabling automated categorization, routing, metadata tagging, and entity extraction.
Design rigorous experimentation and data quality frameworks, including A/B testing, cross-validation strategies, and statistical significance testing to evaluate model performance and hyperparameter tuning. Establish best practices for annotation quality management, training data curation, active learning strategies, and ground truth validation to ensure high-quality labeled datasets.
Design, manage, and optimize the workflows involved in preparing data for machine learning model training, select statistical or Deep Learning models that are best positioned to achieve business results.
Develop and deploy models using Python and AWS SageMaker, managing the full lifecycle from exploratory data analysis and prototyping through production deployment, monitoring, and performance tracking. Collaborate with data engineers and ML engineers to ensure seamless integration of analytical models into production document processing pipelines and data workflows.

Requirements

Bachelor's degree or MS or PhD in quantitative discipline, e.g. Computer Science, Mathematics, Operations Research, Data Science.
7+ years of experience in data science or quantitative analytics, with at least 2+ years of experience in document AI, computer vision, or NLP domains.
Strong foundation in statistics, mathematics, and programming, including probability, mathematical modeling, and experimental design with the ability to rigorously evaluate model performance with advanced proficiency in Python for data analysis, modeling, and visualization, and deep experience in PyTorch, TensorFlow, Hugging Face Transformers, scikit-learn, OpenCV, pandas, NumPy, matplotlib, and seaborn.
Hands-on experience with CNN and transformer architectures for document AI for image classification, transfer learning, and feature extraction; multimodal document understanding combining textual, visual, and layout features; and NLP models for text categorization, sequence labeling, named entity recognition, and semantic analysis with familiarity with additional computer vision models including object detection, image segmentation, and Vision Transformers.
Working experience with OCR technologies and image preprocessing, for text extraction from scanned documents, with an understanding of OCR accuracy metrics, preprocessing optimization, and error analysis. Proficiency in image preprocessing techniques for scanned documents in TIF/PNG format, including deskewing, binarization, resolution enhancement, noise removal, and multi-page document handling.
Hands-on experience with AWS SageMaker and Amazon Bedrock, including building, training, tuning, and deploying ML models in cloud-based production environments (notebook instances, training jobs, inference endpoints), as well as exploring foundation models and generative AI capabilities to augment document understanding and classification workflows and experience with containerized deployments on AWS EKS for productionizing data science models and analytical services at scale.
Proficiency in SQL with strong working knowledge of Oracle databases for complex data extraction, transformation, and analysis of document metadata and extracted content with working knowledge of Java and Groovy for collaborating with engineering teams and understanding enterprise application codebases and strong understanding of annotation tools, active learning strategies, and training data management for supervised learning in document AI use cases.

Preferred qualifications, capabilities, and skills

Domain expertise in the healthcare industry

Benefits & conditions

We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.

About the company

JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.