Data Engineer - AI Compliance

All Cares

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Tech stack

Java

Artificial Intelligence

Big Data

Computer Programming

Information Engineering

Data Governance

Data Infrastructure

ETL

Data Visualization

Fault Tolerance

Python

Machine Learning

TensorFlow

Data Streaming

Management of Software Versions

Data Logging

Data Ingestion

PyTorch

Spark

Scikit Learn

Information Technology

Kafka

Feature Extraction

Text Analysis

Document Classification

Data Pipelines

Job description

We are seeking a Data Engineer to build and scale systems that support text and voice analysis, risk detection, and classifier training workflows. You will be responsible for production-grade machine learning pipelines (0 1) and collaborate closely with data scientists and AI engineers to deliver compliant, reliable data infrastructure and services., * Build and maintain end-to-end ML pipelines: data ingestion, preprocessing, feature extraction, model training, evaluation and deployment.

Develop reliable workflows specifically for voice and text analysis models.

Data Infrastructure

Design and maintain data storage, ETL workflows, and streaming/batch systems.
Implement data-quality, data-labeling, versioning and governance practices.

ML Collaboration

Work with data scientists and AI engineers to productionize models (e.g., text classifiers, anomaly-detection models, compliance-scoring models).
Support model monitoring and performance tracking once models are live.

Scalability & Reliability

Build robust, scalable, fault-tolerant pipelines.
Add observability layers: logging, monitoring, alerting for data and model pipelines.

Documentation & Governance

Document ETL processes, schemas, architecture and workflows.
Support compliance, data governance, and security standards in data pipelines and infrastructure.

Requirements

3+ years in data engineering or ML engineering roles.
Proven experience building ML pipelines from scratch.
Experience with text classification, voice analysis or similar ML tasks is a strong plus.

Technical Skills

Strong programming skills (Python, Scala or Java).
Experience with big-data/streaming frameworks (Spark, Beam, Kafka or similar).
Familiarity with ML frameworks (PyTorch, TensorFlow, scikit-learn).
Experience with cloud data infrastructure and production deployment.

Soft Skills

Strong analytical and problem-solving skills.
Excellent collaborator and communicator-capable of working with data scientists, engineers and product/compliance stakeholders.
Detail-oriented, documentation-focused and comfortable in a fast-paced environment., * Degree in Data Engineering, Computer Science, Machine Learning or related field (or equivalent experience).

Benefits & conditions

Be at the intersection of cutting-edge AI/voice technology and compliance.
Make an impact by shaping a growing brand in a high-growth market.
Work with a collaborative, high-energy remote team driving forward-thinking solutions.
Grow your career and influence across product, marketing and business domains.

About the company

Cephalgo is a Strasbourg-based technology company founded in 2020, focused on developing AI solutions that ensure safety, compliance, and trust in human-AI interactions. Originally rooted in healthcare innovation, Cephalgo's platform helps organizations securely analyze and monitor voice and emotion data while meeting privacy, security, and regulatory standards. Backed by over €3 million in funding, Cephalgo combines deep expertise in voice AI, data protection, and compliance frameworks to help enterprises build and deploy responsible AI systems. The company collaborates with leading European partners in AI ethics, healthcare, and regulatory technology.