Data Engineer - GenAI / RAG / LangChain / LangGraph

Ravh IT Solutions

Irvine, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Irvine, United States of America

Tech stack

API

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Software Applications

Azure

Google BigQuery

Cloud Computing

Continuous Integration

Information Engineering

ETL

Data Warehousing

Data Flow Control

Github

Graph Database

Python

Machine Learning

Standard Sql

Search Technologies

Software Deployment

Workflow Management Systems

Google Cloud Platform

Enterprise Software Applications

Retrieval-Augmented Generation

Large Language Models

Snowflake

Multi-Agent Systems

Prompt Engineering

Spark

Generative AI

GIT

FastAPI

Microsoft Fabric

Data Lake

PySpark

Kubernetes

HuggingFace

Kafka

Data Management

Machine Learning Operations

Virtual Agents

REST

Data Pipelines

Docker

Databricks

Job description

Machine Learning fundamentals
Embedding Models
Semantic Search
Document Processing
NLP
Model Deployment (preferred)

Additional Skills

REST APIs / FastAPI
Docker
Kubernetes (Preferred)
MLflow
Kafka (Preferred)

Responsibilities

Design and develop scalable enterprise data pipelines.
Build Retrieval-Augmented Generation (RAG) applications.
Develop AI Agents using LangChain and LangGraph.
Integrate enterprise data sources with LLMs.
Build semantic search solutions using vector databases.
Optimize prompt engineering and LLM performance.
Work with structured and unstructured data sources.
Collaborate with Data Scientists, ML Engineers, and Business stakeholders.
Ensure data quality, governance, scalability, and security.

Requirements

We are seeking a highly experienced Senior Data Engineer with expertise in modern data engineering and Generative AI technologies. The ideal candidate should have hands-on experience designing scalable data platforms while building AI-powered applications using RAG (Retrieval-Augmented Generation), LangChain, LangGraph, LLMs, and Vector Databases.

The candidate should possess strong cloud data engineering expertise along with practical experience integrating Large Language Models into enterprise applications.

Mandatory Skills

Data Engineering

8+ years of experience in Data Engineering
Strong expertise in Python and SQL
Apache Spark / PySpark
Databricks
ETL/ELT Pipeline Development
Delta Lake
Data Warehousing & Data Lake Architecture
Apache Airflow or equivalent orchestration tools
CI/CD for Data Pipelines
Git / Azure DevOps / GitHub

Cloud Platforms (Any One)

Microsoft Azure (ADF, Synapse, ADLS)
AWS (Glue, EMR, Lambda, S3, Athena)
Google Cloud Platform (BigQuery, Dataflow)

Generative AI / LLM

Hands-on experience building RAG (Retrieval-Augmented Generation) solutions
LangChain
LangGraph
OpenAI / Azure OpenAI / Anthropic Claude / Gemini APIs
Prompt Engineering
AI Agents / Multi-Agent Workflows
LLM Orchestration
Function Calling / Tool Calling
LLM Evaluation and Optimization

Vector Databases

Experience with one or more:

Pinecone
ChromaDB
FAISS
Weaviate
Milvus
Azure AI Search, * Financial Services / Asset Management
Banking
Healthcare
Insurance
Retail
Manufacturing

Nice to Have

Microsoft Fabric
Snowflake
DBT
MLOps
Hugging Face
LlamaIndex
CrewAI / AutoGen
MCP (Model Context Protocol)
Knowledge Graphs
GraphRAG

Recruiter Screening Checklist

Candidates must have:

️ 8+ years of Data Engineering experience
️ Strong Python & SQL
️ Databricks / Spark
️ Azure or AWS
️ RAG implementation experience
️ LangChain
️ LangGraph
️ OpenAI / Azure OpenAI
️ Vector Database experience
️ AI Agent development
️ Production deployment of LLM applications
️ Strong communication skills

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all