Software Engineer II - Retrieval-Augmented Generation (RAG) System

RELX Group plc
Philadelphia, United States of America
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 159K

Job location

Philadelphia, United States of America

Tech stack

A/B testing
API
Amazon Web Services (AWS)
Audit Trail
Azure
Cloud Computing
Computer Programming
Continuous Integration
Data Governance
Data Masking
Document Retrieval
Python
Node.js
Performance Tuning
Software Engineering
SQL Databases
Systems Integration
Management of Software Versions
Workflow Management Systems
Data Logging
Google Cloud Platform
Large Language Models
Prompt Engineering
Spark
Pandas
AI Platforms
Kubernetes
Low Latency
Integration Frameworks
Machine Learning Operations
New Relic (SaaS)
Software Version Control
Data Pipelines
Docker

Job description

About the role, we are seeking an experienced engineer to work with a team to build and support a healthcare centered production-scale RAG system that combines document retrieval with response generation to deliver accurate, context-aware answers. This engineer we be expected to design, implement, and operate end-to-end RAG pipelines- LLM interaction, API creation, and high-performance, secure delivery of knowledge-grounded capabilities. You will collaborate with data engineers, platform teams, and product partners to ship reliable, scalable, and observable systems.

About the team; This collaborative team is entrusted with building the Next Generation Health Solutions through the utilization of cutting-edge technology.

Role and responsibilities

  • Architecting, implementing, testing, and operating end-to-end RAG workflows:

  • Ingesting and normalizing documents from diverse sources

  • Generating and managing embeddings; index and query vector databases Retrieve relevant passages, apply reranking or fusion strategies, and feed prompts to LLMs

  • Building scalable, low-latency services and APIs (Python preferred; other languages acceptable) and ensure production-grade reliability (monitoring, tracing, alerting)

  • Integrating with vector databases and embedding pipelines and optimize for latency, throughput, and cost

  • Designing and implementing ML Ops workflows: model/version management, experiments, feature stores, CI/CD for ML-enabled services, rollback plans

  • Developing robust data pipelines and governance around ingestion, provenance, quality checks, and access controls

  • Collaborating with data engineers to improve retrieval quality (embedding strategies, reranking, cross-encoder models, prompt engineering) and implement evaluation metrics (precision/recall, MRR, QA accuracy, user-centric metrics)

  • Implementing monitoring and observability for RAG components (latency, success rate, cache hit rate, retrieval quality, data drift)

  • Ensuring security, privacy, and compliance (authentication, authorization, data masking, PII handling, audit logging)

Requirements

  • 5+ years of professional software engineering experience designing and delivering production systems
  • Strong programming skills (Python required; NodeJs a plus)
  • Deep understanding of retrieval-augmented or application-scale NLP systems and practical experience building RAG-like pipelines
  • Hands-on experience with ML workflow tooling and MLOps concepts (model serving, versioning, experiments, feature stores, reproducibility)
  • Proficiency with cloud infrastructure and modern software practices (AWS/Google Cloud Platform/Azure; Docker; Kubernetes; CI/CD)
  • Strong problem-solving skills, excellent communication, and ability to work with cross-functional teams
  • Familiarity with data governance, privacy, and security best practices, * Experience with agentic workflow tools (LangGraph) and familiarity with prompt engineering for LLMs
  • Exposure to working with and evaluating different LLMs
  • Knowledge of evaluation methodologies for retrieval and QA systems and the ability to set up A/B tests and dashboards
  • Experience with data processing frameworks (SQL, Pandas, Spark) and working with large-scale data pipelines
  • Background in performance optimization for low-latency AI services (MLflow)
  • Experience with monitoring and logging via New Relic, K9s, Portkey, etc
  • Experience with minimizing token usage and cost optimization
  • Comfortable with design and implementation of security controls for data-intensive AI systems

About the company

Elsevier is a renowned global information analytics company that primarily focuses on providing scientific, technical, and medical (STM) research content, tools, and services. It is one of the largest publishers of academic journals and scholarly literature in the world. Elsevier operates in various domains, including science, technology, medicine, social sciences, and more. They publish a vast number of peer-reviewed journals covering a wide range of disciplines. These journals act as platforms for researchers and academics to share their findings and contribute to the advancement of knowledge in their respective fields.

Apply for this position