Data Engineer - AI Compliance
All Cares
5 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
IntermediateJob location
Remote
Tech stack
Java
Artificial Intelligence
Big Data
Computer Programming
Information Engineering
Data Governance
Data Infrastructure
ETL
Data Visualization
Fault Tolerance
Python
Machine Learning
TensorFlow
Data Streaming
Management of Software Versions
Data Logging
Data Ingestion
PyTorch
Spark
Scikit Learn
Information Technology
Kafka
Feature Extraction
Text Analysis
Document Classification
Data Pipelines
Job description
We are seeking a Data Engineer to build and scale systems that support text and voice analysis, risk detection, and classifier training workflows. You will be responsible for production-grade machine learning pipelines (0 1) and collaborate closely with data scientists and AI engineers to deliver compliant, reliable data infrastructure and services., * Build and maintain end-to-end ML pipelines: data ingestion, preprocessing, feature extraction, model training, evaluation and deployment.
- Develop reliable workflows specifically for voice and text analysis models.
Data Infrastructure
- Design and maintain data storage, ETL workflows, and streaming/batch systems.
- Implement data-quality, data-labeling, versioning and governance practices.
ML Collaboration
- Work with data scientists and AI engineers to productionize models (e.g., text classifiers, anomaly-detection models, compliance-scoring models).
- Support model monitoring and performance tracking once models are live.
Scalability & Reliability
- Build robust, scalable, fault-tolerant pipelines.
- Add observability layers: logging, monitoring, alerting for data and model pipelines.
Documentation & Governance
- Document ETL processes, schemas, architecture and workflows.
- Support compliance, data governance, and security standards in data pipelines and infrastructure.
Requirements
- 3+ years in data engineering or ML engineering roles.
- Proven experience building ML pipelines from scratch.
- Experience with text classification, voice analysis or similar ML tasks is a strong plus.
Technical Skills
- Strong programming skills (Python, Scala or Java).
- Experience with big-data/streaming frameworks (Spark, Beam, Kafka or similar).
- Familiarity with ML frameworks (PyTorch, TensorFlow, scikit-learn).
- Experience with cloud data infrastructure and production deployment.
Soft Skills
- Strong analytical and problem-solving skills.
- Excellent collaborator and communicator-capable of working with data scientists, engineers and product/compliance stakeholders.
- Detail-oriented, documentation-focused and comfortable in a fast-paced environment., * Degree in Data Engineering, Computer Science, Machine Learning or related field (or equivalent experience).
Benefits & conditions
- Be at the intersection of cutting-edge AI/voice technology and compliance.
- Make an impact by shaping a growing brand in a high-growth market.
- Work with a collaborative, high-energy remote team driving forward-thinking solutions.
- Grow your career and influence across product, marketing and business domains.
About the company
Cephalgo is a Strasbourg-based technology company founded in 2020, focused on developing AI solutions that ensure safety, compliance, and trust in human-AI interactions. Originally rooted in healthcare innovation, Cephalgo's platform helps organizations securely analyze and monitor voice and emotion data while meeting privacy, security, and regulatory standards.
Backed by over €3 million in funding, Cephalgo combines deep expertise in voice AI, data protection, and compliance frameworks to help enterprises build and deploy responsible AI systems. The company collaborates with leading European partners in AI ethics, healthcare, and regulatory technology.