Data Engineer - GenAI / RAG / LangChain / LangGraph
Role details
Job location
Tech stack
Job description
- Machine Learning fundamentals
- Embedding Models
- Semantic Search
- Document Processing
- NLP
- Model Deployment (preferred)
Additional Skills
- REST APIs / FastAPI
- Docker
- Kubernetes (Preferred)
- MLflow
- Kafka (Preferred)
Responsibilities
- Design and develop scalable enterprise data pipelines.
- Build Retrieval-Augmented Generation (RAG) applications.
- Develop AI Agents using LangChain and LangGraph.
- Integrate enterprise data sources with LLMs.
- Build semantic search solutions using vector databases.
- Optimize prompt engineering and LLM performance.
- Work with structured and unstructured data sources.
- Collaborate with Data Scientists, ML Engineers, and Business stakeholders.
- Ensure data quality, governance, scalability, and security.
Requirements
We are seeking a highly experienced Senior Data Engineer with expertise in modern data engineering and Generative AI technologies. The ideal candidate should have hands-on experience designing scalable data platforms while building AI-powered applications using RAG (Retrieval-Augmented Generation), LangChain, LangGraph, LLMs, and Vector Databases.
The candidate should possess strong cloud data engineering expertise along with practical experience integrating Large Language Models into enterprise applications.
Mandatory Skills
Data Engineering
- 8+ years of experience in Data Engineering
- Strong expertise in Python and SQL
- Apache Spark / PySpark
- Databricks
- ETL/ELT Pipeline Development
- Delta Lake
- Data Warehousing & Data Lake Architecture
- Apache Airflow or equivalent orchestration tools
- CI/CD for Data Pipelines
- Git / Azure DevOps / GitHub
Cloud Platforms (Any One)
- Microsoft Azure (ADF, Synapse, ADLS)
- AWS (Glue, EMR, Lambda, S3, Athena)
- Google Cloud Platform (BigQuery, Dataflow)
Generative AI / LLM
- Hands-on experience building RAG (Retrieval-Augmented Generation) solutions
- LangChain
- LangGraph
- OpenAI / Azure OpenAI / Anthropic Claude / Gemini APIs
- Prompt Engineering
- AI Agents / Multi-Agent Workflows
- LLM Orchestration
- Function Calling / Tool Calling
- LLM Evaluation and Optimization
Vector Databases
Experience with one or more:
- Pinecone
- ChromaDB
- FAISS
- Weaviate
- Milvus
- Azure AI Search, * Financial Services / Asset Management
- Banking
- Healthcare
- Insurance
- Retail
- Manufacturing
Nice to Have
- Microsoft Fabric
- Snowflake
- DBT
- MLOps
- Hugging Face
- LlamaIndex
- CrewAI / AutoGen
- MCP (Model Context Protocol)
- Knowledge Graphs
- GraphRAG
Recruiter Screening Checklist
Candidates must have:
- ️ 8+ years of Data Engineering experience
- ️ Strong Python & SQL
- ️ Databricks / Spark
- ️ Azure or AWS
- ️ RAG implementation experience
- ️ LangChain
- ️ LangGraph
- ️ OpenAI / Azure OpenAI
- ️ Vector Database experience
- ️ AI Agent development
- ️ Production deployment of LLM applications
- ️ Strong communication skills