Senior Graph Data Engineer (Cloud)

ESQlabs GmbH
11 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Tech stack

API
Airflow
Amazon Web Services (AWS)
Audit Trail
Azure
Cloud Computing
Databases
Continuous Integration
Information Engineering
ETL
Query Languages
DevOps
Disaster Recovery
Github
Graph Database
Identity and Access Management
Python
Metadata Standards
Neo4j
Node.js
Pattern Recognition
Markdown
Prometheus
SPARQL
Data Logging
Data Processing
PyTorch
Grafana
Spark
Jupyter
Amazon Web Services (AWS)
Cloudformation
Pandas
Gitlab-ci
Bicep
Plotly
Kafka
Cosmos DB
Machine Learning Operations
Cloudwatch
Terraform
Data Pipelines
Docker

Job description

We are seeking a talented and motivated Senior Graph Data Engineer (Cloud) (f/m/d) to help us design, deploy, and operate a production-grade graph database service., As part of the MPSlabs team and in close collaboration with multidisciplinary collaborators, you will help us establish a robust graph database service for execution within a project. The tasks associated with this role are (but not limited to):

  • Schema design: A maintainable property-graph schema with clear node/relationship types, properties, constraints, and indexes.
  • Cloud deployment and integration: Infrastructure-as-Code procedures and CI/CD to provision and initialize the graph DB with the agreed schema; secure connectivity to relevant data sources so the database is ready to be populated.
  • Performance and operations: Database configuration, tuning, observability (metrics, logs, traces), usage monitoring, SLOs/alerts, backup/restore, and cost-aware scale-up/scale-out strategies.
  • Schema exploration: Reproducible introspection via built-in tools and notebooks to visualize/list labels, relationship types, property keys, constraints, and indexes.
  • Query development: Optimized queries for neighborhood discovery, shortest paths, recurring motifs/shapes, and structural introspection.
  • Graph algorithms and pattern detection: Workflows for triangles/stars/chains, community detection, centrality, and link prediction; encode domain patterns provided by collaborators.
  • Statistical analysis and graph ML: Descriptive stats (counts, degree and path length distributions, clustering coefficient, density), centrality reports, community summaries, embeddings, and simple ML pipelines (clustering/classification) using graph-derived features.
  • ML tools integration: Interfaces and containers to integrate analysis tools from other sources.
  • Iterative analysis and reporting: Versioned analytics and reports that update with data changes; documented assumptions and changelogs., * Data engineering and integration
  • Building secure ingestion/ELT pipelines from APIs, object storage, and databases with validation, schema evolution, and idempotent loads.
  • Cloud, security, and DevOps
  • Deploying managed graph services or self-managed clusters on AWS/Azure/GCP.
  • Infrastructure as Code (Terraform or CloudFormation/Bicep), containers (Docker), CI/CD (GitHub Actions/GitLab CI/Azure DevOps).
  • Observability (CloudWatch/Prometheus/Grafana), centralized logging, alerting, backup/restore, and disaster recovery.
  • Security best practices: IAM/roles, VPC design, secrets management, TLS in transit/at rest, least-privilege access, and auditability.
  • Communication and documentation
  • Clear technical writing (runbooks, ADRs, user guides); stakeholder communication; ability to translate domain patterns into graph designs and queries.

Further qualities that will put you in the spotlight

  • Graph ML and embeddings: node2vec/DeepWalk/ GraphSAGE; familiarity with PyTorch Geometric or DGL; evaluation and basic MLOps hygiene.
  • Visualization: Neo4j Bloom, yFiles, Graphistry, Cytoscape; lightweight app dashboards (Streamlit/Plotly Dash).
  • Orchestration and data movement: Airflow/Prefect, event streaming (Kafka), and batch scheduling.
  • Biomedical data familiarity: integrating outputs from biomedical data pipelines; awareness of FAIR data, metadata standards, and controlled access patterns.
  • Compliance and governance: experience with GDPR/HIPAA-aligned controls, data minimization, pseudonymization, and audit logging.
  • Python for data processing/automation; comfort with pandas and either NetworkX, graph-tool, or Spark GraphFrames; reproducible notebooks (Jupyter).
  • R/stats workflows: tidyverse and automated reporting (Quarto/Markdown) for stakeholder deliverables.

Requirements

Do you have experience in Terraform?, * Graph databases and query languages *

  • Production experience with at least one major platform: Neo4j (Cypher, APOC, GDS), AWS Neptune (Gremlin/SPARQL), Azure Cosmos DB for Gremlin, TigerGraph, or ArangoDB.
  • Strong schema/constraint/index design; query profiling and optimization; practical understanding of cardinality and selectivity.
  • Graph algorithms and analytics
  • Hands-on with shortest path, motif searches, centrality (PageRank, betweenness, closeness), community detection (e.g., Louvain/Leiden), and basic link prediction.
  • Ability to compute and interpret graph statistics (node/edge counts, degree and path length distributions, clustering coefficient, density).

Benefits & conditions

The dynamic team behind ESQlabs unites people with different backgrounds, spanning disciplines like pharmaceutical sciences, physics, bioinformatics, mathematics, data science, and software engineering. Our trust in each other, our open-mindedness, and our constant quest for excellence and innovation unite us. We value flat hierarchies, where everyone can contribute ideas and expertise to develop the best product. We are passionate, always willing to broaden our horizons, and love our start-up culture.

You will also benefit from:

  • Flexible work hours and a home office policy that focuses on people and not on numbers
  • An attractive remuneration package
  • A dedicated budget for education programs and conferences you can attend
  • A working environment in which your contribution will make a difference, and that allows you to take ownership of projects and processes
  • Responsibility, autonomy, participation, and career perspective

About the company

ESQlabs is an innovative, internationally acting Contract Research Organization and a global leader in the development and application of the OSP Suite (www.open-systems-pharmacology.org). We are a research-focused provider of specialized computational analyses in the life sciences industry., MPSlabs is a dedicated research and business unit situated under the umbrella of ESQlabs. MPSlabs spearheads the development of digital twin platforms for organ-on-chip and micro-physiological systems, bringing invaluable expertise to projects that require advanced simulation and modeling capabilities. ESQlabs is an innovative, internationally acting Contract Research Organization and a global leader in the development and application of the OSP Suite (www.open-systems-pharmacology.org). We are a research-focused provider of specialized computational analyses in the life sciences industry. In close collaboration with industry and academia, we are committed to software development to fulfil our own and our clients' needs and the needs of the open-source community. We collaborate with global corporations in the pharma and chemical industry, as well as start-ups in the life-science technology sector and academic and non-profit research institutions. We define new standards in pharmacometrics and systems pharmacology and toxicology areas. Our software and model platforms help scientists understand mechanisms of diseases and chemical toxicity to optimize individualized treatment of patients and protect the health of the animal and human life. ESQlabs has attracted international players as sponsors for the OSP Software development. In addition, funding through national and EU-wide research grants allows us to develop our core technologies continuously.

Apply for this position