Software Engineer II - Retrieval-Augmented Generation (RAG) System
Role details
Job location
Tech stack
Job description
About the role, we are seeking an experienced engineer to work with a team to build and support a healthcare centered production-scale RAG system that combines document retrieval with response generation to deliver accurate, context-aware answers. This engineer we be expected to design, implement, and operate end-to-end RAG pipelines- LLM interaction, API creation, and high-performance, secure delivery of knowledge-grounded capabilities. You will collaborate with data engineers, platform teams, and product partners to ship reliable, scalable, and observable systems.
About the team; This collaborative team is entrusted with building the Next Generation Health Solutions through the utilization of cutting-edge technology.
Role and responsibilities
-
Architecting, implementing, testing, and operating end-to-end RAG workflows:
-
Ingesting and normalizing documents from diverse sources
-
Generating and managing embeddings; index and query vector databases Retrieve relevant passages, apply reranking or fusion strategies, and feed prompts to LLMs
-
Building scalable, low-latency services and APIs (Python preferred; other languages acceptable) and ensure production-grade reliability (monitoring, tracing, alerting)
-
Integrating with vector databases and embedding pipelines and optimize for latency, throughput, and cost
-
Designing and implementing ML Ops workflows: model/version management, experiments, feature stores, CI/CD for ML-enabled services, rollback plans
-
Developing robust data pipelines and governance around ingestion, provenance, quality checks, and access controls
-
Collaborating with data engineers to improve retrieval quality (embedding strategies, reranking, cross-encoder models, prompt engineering) and implement evaluation metrics (precision/recall, MRR, QA accuracy, user-centric metrics)
-
Implementing monitoring and observability for RAG components (latency, success rate, cache hit rate, retrieval quality, data drift)
-
Ensuring security, privacy, and compliance (authentication, authorization, data masking, PII handling, audit logging)
Requirements
- 5+ years of professional software engineering experience designing and delivering production systems
- Strong programming skills (Python required; NodeJs a plus)
- Deep understanding of retrieval-augmented or application-scale NLP systems and practical experience building RAG-like pipelines
- Hands-on experience with ML workflow tooling and MLOps concepts (model serving, versioning, experiments, feature stores, reproducibility)
- Proficiency with cloud infrastructure and modern software practices (AWS/Google Cloud Platform/Azure; Docker; Kubernetes; CI/CD)
- Strong problem-solving skills, excellent communication, and ability to work with cross-functional teams
- Familiarity with data governance, privacy, and security best practices, * Experience with agentic workflow tools (LangGraph) and familiarity with prompt engineering for LLMs
- Exposure to working with and evaluating different LLMs
- Knowledge of evaluation methodologies for retrieval and QA systems and the ability to set up A/B tests and dashboards
- Experience with data processing frameworks (SQL, Pandas, Spark) and working with large-scale data pipelines
- Background in performance optimization for low-latency AI services (MLflow)
- Experience with monitoring and logging via New Relic, K9s, Portkey, etc
- Experience with minimizing token usage and cost optimization
- Comfortable with design and implementation of security controls for data-intensive AI systems