Gen AI Engineer

Skywaves MP LLC
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Java
API
Amazon Web Services (AWS)
Artificial Neural Networks
Azure
Signals Intelligence
Computational Linguistics
Data Deduplication
Data Fusion
Web Scraping
Relational Databases
Document Retrieval
Entity Relationship Models
Graph Database
Information Sciences
JSON
Python
Link Analysis
Neo4j
Open Source Technology
Open Source Intelligence
Software Engineering
SPARQL
Systems Integration
Unstructured Data
Datadog
Google Cloud Platform
Cloud Platform System
Data Ingestion
Large Language Models
Multi-Agent Systems
Prompt Engineering
Generative AI
Kotlin
Knowledge Representation
Containerization
Kubernetes
Information Technology
Low Latency
Playwright
Kafka
Search Engines
GraphQL
Machine Learning Operations
Virtual Agents
Docker
Microservices

Job description

Knowledge Graph Engineering

  • Design, build and maintain large-scale property graphs and RDF triplestores (Neo4j, Amazon Neptune, Stardog, or equivalent).

  • Develop and govern ontologies, taxonomies, and entity-relationship schemas that reflect real-world domain semantics.

  • Implement graph ingestion pipelines that extract, transform, and link entities from structured, semi-structured, and unstructured data.

  • Optimise graph traversal queries (Cypher, SPARQL, Gremlin) for sub-second response at production scale.

  • Train and deploy graph neural networks (GNNs) for node classification, link prediction, and subgraph retrieval - Maintain model retraining workflows triggered by graph drift or coverage degradation.

Agentic AI Systems

  • Architect and implement autonomous agents that plan multi-step reasoning chains over knowledge graph data using LLMs (GPT-4o, Claude, Gemini, or open-source equivalents).
  • Build graph-aware Retrieval-Augmented Generation (RAG) pipelines that blend structured graph context with unstructured document retrieval.
  • Design tool-use and function-calling layers so agents can query live data sources - web search, REST/GraphQL APIs, relational databases - to extend or verify graph knowledge.
  • Implement agent memory, reflection, and self-correction loops to improve reliability over multi-hop tasks.

Context Enrichment & Data Fusion

  • Integrate web scraping, news feeds, and open-source intelligence (OSINT) sources to keep the knowledge graph current.
  • Build entity resolution and deduplication components that merge data from heterogeneous sources into a consistent graph.
  • Develop confidence-scoring and provenance-tracking mechanisms so downstream consumers understand the reliability of any piece of context.

MLOps & Production Readiness

  • Package agents as scalable microservices; instruments with observability tooling (tracing, latency, token cost).
  • Collaborate with platform engineers to deploy workloads on cloud-native infrastructure (AWS / Google Cloud Platform / Azure).
  • Maintain evaluation harnesses that measure agent accuracy, hallucination rate, and graph coverage over time.

Requirements

  • 7-10 years of professional software engineering with strong Python (or Java / Kotlin) proficiency.
  • 2+ Yrs, Hands-on production experience with at least one major graph database - Neo4j, Amazon Neptune, TigerGraph, or comparable.
  • Demonstrated knowledge of graph query languages like Cypher, SPARQL, or Gremlin - at production query complexity.
  • Direct experience building LLM-powered agents or pipelines using frameworks such as LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, or Semantic Kernel.
  • Solid understanding of RAG architectures: chunking strategies, vector stores (Pinecone, Weaviate, pgvector), hybrid retrieval, and re-ranking.
  • Familiarity with prompt engineering, few-shot learning, and LLM evaluation techniques.
  • Experience integrating external data sources via APIs, web scraping (Playwright / Scrapy), or streaming pipelines (Kafka / Kinesis).
  • Working knowledge of containerisation (Docker, Kubernetes) and CI/CD pipelines.
  • Familiarity with graph export formats - at least one GraphML, RDF/OWL, or JSON-LD
  • Experience integrating GNN-derived features into vector stores or RAG pipelines, * Advanced degree (MS / PhD) in Computer Science, Information Science, Computational Linguistics, or a related field.
  • Experience in intelligence, defence, or trade-craft environments - working with OSINT, link analysis, entity disambiguation, or signals intelligence data.
  • Understanding of access-control models for sensitive graph data (need-to-know, compartmentalisation, provenance labelling).
  • Familiarity with knowledge representation standards like OWL, SHACL, RDF-star, JSON-LD, W3C PROV.
  • Experience with fine-tuning or instruction-tuning open-source LLMs (Llama, Mistral, Falcon) for domain-specific tasks.
  • Background in network-analysis algorithms: centrality, community detection, path-finding, anomaly detection on graphs.
  • Contributions to open-source graph or GenAI projects; published research or technical blog presence.
  • Active or adjudicatable security clearance (Secret or above) - strongly preferred for trade-craft assignments.

Apply for this position