Senior Data Scientist - Artificial Intelligence R&D

Caterpillar

Peoria, United States of America

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 183K

Job location

Peoria, United States of America

Tech stack

Microsoft Access

Agile Methodologies

Artificial Intelligence

Amazon Web Services (AWS)

Data analysis

Artificial Neural Networks

Computer Vision

Automated Storage and Retrieval Systems

Azure

Computer Programming

Data Cleansing

Information Engineering

Python

Machine Learning

Natural Language Processing

NumPy

Open Source Technology

TensorFlow

AI Infrastructure

Data Storage Technologies

PyTorch

Large Language Models

Deep Learning

Model Validation

Generative AI

Pandas

Git Flow

Information Technology

Low Latency

Machine Learning Operations

Software Version Control

Data Generation

Job description

Join the AI Research & Development team of Cat Digital and play a central role in advancing the frontier of applied AI for one of the world's largest industrial enterprises. As a Senior Data Scientist, you will design, build, and evaluate cutting-edge AI systems, spanning generative AI, large language models (LLMs), multimodal intelligence, retrieval-augmented generation (RAG), and autonomous agents, delivering high-impact Proofs of Concept (POCs) with clear production intent while exploring longer-horizon research opportunities., * Design and execute AI experiments across the full model lifecycle: hypothesis formulation, data preparation, model development, evaluation, and iteration, maintaining research rigor in an ambiguous, fast-moving environment.

Develop, fine-tune, and benchmark LLMs and multimodal AI models (text, vision, speech), including systematic evaluation of quality, latency, cost, and safety tradeoffs across model variants and providers.
Explore and optimize knowledge retrieval systems (RAG pipelines, vector databases, hybrid search) and agentic workflows, ensuring relevance, accuracy, and scalability for enterprise use cases.
Lead data preparation workstreams for model training, fine-tuning, and validation, including dataset curation, labeling strategy, synthetic data generation, and quality assurance.
Instrument AI systems for observability and reproducibility using experiment tracking frameworks (e.g., Langfuse, MLflow), maintaining clear documentation of model versions, evaluation datasets, and performance baselines.
Translate research findings into production-ready prototypes, collaborating with Engineering and Product teams to define technical requirements, integration paths, and deployment readiness criteria.
Evaluate emerging AI capabilities and tools (open-source and commercial), providing structured assessments and recommendations to inform the team's technology strategy.
Mentor and coach junior Data Scientists, establishing best practices for experimentation, model evaluation, and responsible AI development across the team.
Communicate insights and results to technical and non-technical stakeholders, including product managers, engineers, and senior leadership, with clarity and business impact framing.

Requirements

Applied Statistics & Quantitative Methods: Experience applying statistical thinking to experimentation, evaluation, and decision-making in ambiguous, research-driven environments.
Analytical Rigor & Attention to Detail: Proven ability to design precise experiments, validate assumptions, and ensure accuracy and reproducibility of results.
Advanced Machine Learning & AI: Knowledge of modern ML techniques, including deep learning, generative AI, NLP, computer vision, and multimodal systems, with hands-on implementation experience.
Model Evaluation & Optimization: Strong experience evaluating model quality and system-level tradeoffs across accuracy, latency, cost, and scalability dimensions.
Programming Expertise: Proficiency in Python for AI and ML development, including use of modern AI frameworks and tooling.
Data Engineering & Access: Strong understanding of data storage, retrieval, and processing systems required to support large-scale training and experimentation workflows.
Requirements & Systems Thinking: Ability to define technical and non-functional requirements that bridge research, engineering, and production concerns.

Considerations for Top Candidates:

Bachelor's, Master's, or PhD degree in Data Science, Computer Science, Machine Learning, Statistics, Applied Mathematics, Engineering, or a closely related technical field.
Proven experience building and deploying advanced ML models beyond traditional analytics use cases.
Extensive proficiency in Python (NumPy, Pandas, PyTorch, LangChain, etc.); ability to write clean, maintainable, production-oriented code and contribute to shared AI infrastructure.
Strong hands-on experience with generative AI, large language models, deep neural networks, and modern ML frameworks.
Demonstrated experience designing evaluation frameworks and benchmarks for AI systems.
Familiarity with AI infrastructure, cloud platforms (AWS, Azure), and scalable experimentation environments.
Advanced experience with version control, experiment tracking, and collaborative development (e.g., Git-based workflows).
Experience working in Agile, cross-functional product development environments.
Prior exposure to industrial, manufacturing, heavy equipment, or complex physical systems is a strong plus, but not required.

Benefits & conditions

Subject to plan eligibility, terms, and guidelines. This is a summary list of benefits.

Medical, dental, and vision benefits*
Paid time off plan (Vacation, Holidays, Volunteer, etc.)*
401(k) savings plans*
Health Savings Account (HSA)*
Flexible Spending Accounts (FSAs)*
Health Lifestyle Programs*
Employee Assistance Program*
Voluntary Benefits and Employee Discounts*
Career Development*
Incentive bonus*
Disability benefits
Life Insurance
Parental leave
Adoption benefits
Tuition Reimbursement

About the company

Your Work Shapes the World at Caterpillar Inc. When you join Caterpillar, you're joining a global team who cares not just about the work we do - but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here - we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it. Cat Digital is the digital and technology arm of Caterpillar Inc., leveraging the latest technologies to build industry leading digital solutions for our customers and dealers. With over 1.5 million connected assets worldwide, our teams use data, technology, advanced analytics, telematics and AI capabilities to help our customers build a better, more sustainable world.