Data Engineer

Noxtua AG

Berlin, Germany

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Experience level

Intermediate

Job location

Remote

Berlin, Germany

Tech stack

API

Artificial Intelligence

Encodings

Computer Programming

Information Engineering

Web Scraping

Data Infrastructure

Data Integration

ETL

Data Structures

Data Systems

DevOps

Graph Database

Python

Parsing

Software Construction

XML

Large Language Models

Backend

GIT

Containerization

Software Version Control

Data Pipelines

Docker

Job description

As a Data Engineer (f/d/m), you will play a key role in our Data Expansion Squad, which is responsible for integrating and operationalizing legal data from multiple jurisdictions. The team transforms heterogeneous source data into a unified, high-quality foundation that powers search, retrieval, and AI-supported workflows across our products. Felix, our VP AI & Data Engineering, will guide you through your journey at Noxtua. With deep expertise in AI systems, Felix leads with a passion for innovation and a collaborative approach, ensuring every team member thrives. You will work closely with AI, engineering, and legal domain experts to adapt and extend existing data workflows for new customer datasets and source formats. Your work will focus on understanding source structures, defining robust mappings, standardizing and enriching content, and ensuring that data is integrated in a way that is reliable, scalable, and easy to use in downstream systems. Our Tech Team of around 32 people, including UI Engineers, UI Designers, AI Engineers, Data Engineers, as well as Fullstack, Backend, and DevOps Engineers. Within that team, the Data Expansion Team provides the data foundation, structure, and metadata needed for our agent-based systems to retrieve relevant legal information efficiently and reliably across jurisdictions., * Design, build, and optimize end-to-end ETL pipelines for legal data from multiple jurisdictions, including cleaning, transformation, chunking, validation, embedding, and ingestion into vector databases

Work extensively with XML-based legal data feeds: parse, validate, normalize, and transform XML structures into scalable internal schemas and unified document formats
Develop and maintain data models and storage schemas that support continuously updated datasets while ensuring consistency, scalability, and accuracy across diverse datasets and large amounts of data
Coordinate data handover and integration from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring reliable and timely updates
Implement and continuously refine metadata enrichment strategies to maximize searchability, ranking quality, and relevance of legal information in vector databases.
Build and maintain a high-performance search and retrieval infrastructure enabling agent-based systems to call search functions and retrieve the most relevant legal information efficiently
Collaborate with product, AI, and legal domain experts to deliver high-quality, reliable data solutions
Own the data integration of one jurisdiction end-to-end

Requirements

Do you have experience in XML?, * Experience: at least 2 years of professional experience in data engineering, and being involved in successfully deployed projects

Programming: Strong Python skills with experience in designing robust data pipelines
Technical Expertise: Experience in building and maintaining reliable ET and RAG pipelines and a solid understanding of data modeling, quality, filtering, validation, and consistency
Infrastructure: Familiarity with containerization (Docker), CI/CD pipelines, and version control (Git)
Fundamentals: Strong grasp of data structures, algorithms, system design principles, and software engineering best practices
Expertise in working with graph databases and familiarity with developing and deploying NLP models is a bonus
Language: English proficiency at the C2 level

Benefits & conditions

Remote: 100% remote work possible (given a German residence), other countries upon request

Working hours: Flexible working hours
Vacation: 26 days + December 24th & 31st off, + 1 additional vacation day per year of employment (up to 30 days)
Discounts: e.g., Urban Sports Club Membership, depending on location
Equipment: Laptop (Lenovo or Mac), plus €1,000 net home office setup budget (paid with your first salary)

About the company

Noxtua is Europe's sovereign Legal AI. This legally competent AI covers the entire spectrum of legal text work - from information gathering (research) and analysis of complex issues (understanding) to document creation (drafting). The legally compliant AI meets the professional, criminal, and data protection requirements for lawyers (e.g. Section 203 German Criminal Code, Section 43e German Federal Code for Lawyers), is certified according to BSI C5, TISAX, ISO 27001, 9001, 27018, 27017, and 42001. The tech company Noxtua has formed exclusive partnerships with leading European publishing houses from Germany, Austria, Switzerland, Poland, the Czech Republic, and Slovakia for the Legal AI Workspaces Beck-Noxtua, MANZ-Noxtua, Swiss-Noxtua, Beck-Noxtua Poland, Beck-Noxtua Czech Republic and Beck-Noxtua Slovakia. Founded in 2017 in the German capital as a result of a research project by Dr. Leif-Nissen Lundbæk and Professor Dr. Michael Huth at Oxford University and Imperial College London, the European legal tech company has many years of experience in developing GDPR-compliant AI solutions and now has offices in Paris, Berlin, Zagreb, and Munich. Strategic partners including Germany's leading legal publisher C.H.BECK as well as the leading law firms CMS and Dentons have invested around 81 million EURO in the European scaleup as part of its Series B.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all