Sr Data Scientist GenAI

Select Minds LLC

Dallas, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Experience level

Senior

Remote

Dallas, United States of America

Artificial Intelligence

Computational Linguistics

Databases

Information Retrieval

Python

Machine Learning

Open Source Technology

TensorFlow

Software Engineering

Data Streaming

Data Ingestion

PyTorch

Large Language Models

Prompt Engineering

Generative AI

Information Technology

HuggingFace

Machine Learning Operations

Software Version Control

Sr Data Scientist (NLP / LLM / Generative AI) Location: Dallas, TX Roles & Responsibilities :

Design, build, fine-tune, and deploy LLMs, transformer-based NLP models, and GenAI solutions for both batch and real-time/streaming contexts.
Own all major components of ML pipelines: data ingestion, cleaning, pre-processing (structured & unstructured), embedding, search & retrieval, prompt engineering, RAG (Retrieval-Augmented Generation).
Collaborate closely with ML Engineers, MLOps, software engineering, product, compliance, legal etc., to move models from prototype to production-ensuring reliability, scalability, monitoring, and maintainability.
Define and implement evaluation frameworks: accuracy, bias, fairness, hallucination, consistency, latency; run UAT, stress-tests, drift detection.
Optimize models and pipelines for performance, cost, and efficiency.
Ensure best practices in model development: version control, repeatability, documentation, governance, and ethical AI use.
Mentor more junior data scientists; help build team skills in NLP, GenAI practices, prompt engineering, fine-tuning.
Identify new use cases; prototype innovations in GenAI/NLP; keep up with latest research and open source developments, decide what to adopt.

10+ years of experience in data science / ML, with substantial work in NLP, LLMs, or Generative AI.
Deep hands-on experience in Python, using frameworks like PyTorch, TensorFlow, HuggingFace etc.
Proven track record building transformer/NLP / LLM models; experience with fine-tuning, prompt engineering.
Solid experience with information retrieval / search: keyword + semantic search, embeddings, vector databases.
Experience working in production / deploying models (batch and streaming), working with MLOps practices.
Strong algorithmic / statistical / mathematical fundamentals. Ability to reason about model behaviour, bias, uncertainty.
Good communicator: able to translate complex technical detail to business / non-technical stakeholders. Nice to Have:
Master's in Computer Science, Computational Linguistics, Statistics, Machine Learning or related field.
Experience with multimodal models (vision + text) or emerging LLMs and agent-based systems.
Experience with open source LLMs & toolkits; familiarity with LangChain or similar frameworks.
Prior experience in regulated environments (finance, risk, legal, compliance) with strong governance, privacy requirements. Work remote temporarily due to COVID-19.