Data Scientist (GenAI, LLM & Machine Learning)

Lorven Technologies Inc
Raleigh, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Raleigh, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Automated Storage and Retrieval Systems
Azure
Cloud Computing
Relational Databases
Distributed Systems
Elasticsearch
Python
PostgreSQL
Machine Learning
MongoDB
Natural Language Processing
Named Entity Recognition
NoSQL
Performance Tuning
TensorFlow
Search Technologies
Systems Integration
Google Cloud Platform
PyTorch
Large Language Models
Prompt Engineering
Spark
Deep Learning
Model Validation
Generative AI
Keras
Containerization
Kubernetes
Information Technology
HuggingFace
Cosmos DB
Machine Learning Operations
Api Design
Spacy
Document Classification
Natural Language Understanding
Data Pipelines
Serverless Computing
Docker

Job description

  • Design, develop, and deploy enterprise-grade AI/ML and Generative AI solutions leveraging Large Language Models (LLMs), NLP techniques, and advanced machine learning methodologies.
  • Build and optimize Retrieval-Augmented Generation (RAG) pipelines, prompt engineering frameworks, vector embedding solutions, and knowledge retrieval systems.
  • Develop AI applications tailored for legal document intelligence, document processing, search, summarization, and classification use cases.
  • Design and implement data pipelines for ingestion, preprocessing, annotation, enrichment, and management of structured and unstructured datasets.
  • Collaborate closely with legal domain experts, business stakeholders, and engineering teams to understand requirements and translate them into scalable AI solutions.
  • Conduct model experimentation, benchmarking, evaluation, and performance optimization to improve accuracy, reliability, and business outcomes.
  • Develop and maintain machine learning models using PyTorch, TensorFlow, Keras, Hugging Face Transformers, and other modern AI frameworks.
  • Implement NLP solutions involving entity extraction, semantic search, document classification, embeddings, and language understanding tasks.
  • Build and optimize integrations with vector databases, search platforms, relational databases, and cloud-native services.
  • Work with AWS, Azure, or Google Cloud Platform services to deploy, monitor, and scale AI/ML workloads in production environments.

Requirements

  • Bachelor's or Master's degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Statistics, or a related field with 6-8+ years of experience in Data Science, Machine Learning, and AI solution development with overall 12-14+ years of experience.
  • 6+ years of hands-on experience designing, developing, and deploying machine learning models and advanced analytics solutions in enterprise environments.
  • Strong experience with Large Language Models (LLMs), Generative AI, Prompt Engineering, Retrieval-Augmented Generation (RAG), and model evaluation frameworks.
  • Advanced proficiency in Python with experience developing scalable AI/ML applications and data processing pipelines.
  • Hands-on experience with deep learning frameworks including PyTorch, TensorFlow, Keras, and Hugging Face Transformers.
  • Strong expertise in Natural Language Processing (NLP) techniques and tools such as spaCy, BERT, Word2Vec, Transformers, Flair, and text classification models.
  • Experience building and maintaining training, validation, benchmarking, and evaluation datasets for AI/ML initiatives.
  • Knowledge of vector databases and search technologies including ChromaDB, Elasticsearch, OpenSearch, or similar platforms.
  • Experience working with relational and NoSQL databases such as PostgreSQL, MongoDB, Cosmos DB, or equivalent.
  • Experience with cloud platforms including AWS, Azure, or Google Cloud Platform for model development, deployment, and scaling.
  • Understanding of data modeling principles, embeddings, clustering, dimensionality reduction, sequence classification, and predictive analytics.
  • Exposure to distributed computing technologies such as Spark, Ray, or Scala is highly preferred.
  • Experience with API development, containerization (Docker/Kubernetes), and MLOps/AIOps practices is highly preferred.
  • Strong analytical, problem-solving, communication, and stakeholder collaboration skills.

Apply for this position