Data Engineer gesucht in Leipzig
Cyber Insight GmbH
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Shift work Languages
English Experience level
IntermediateJob location
Tech stack
API
Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Azure
Google BigQuery
Cloud Computing
Cloud Storage
Databases
Data Validation
Information Engineering
ETL
Data Structures
Data Systems
Data Warehousing
Data Flow Control
Github
Graph Database
JSON
Python
PostgreSQL
Neo4j
Prometheus
Tripwire
XML
Pulumi
Data Processing
Data Storage Technologies
Sql Optimization
Grafana
Backend
Pandas
PySpark
Kubernetes
Google Cloud Functions
Dask
Kafka
Machine Learning Operations
Terraform
Docker
Jenkins
Vulnerability Analysis
Job description
- Design, build, and maintain data pipelines and ETL/ELT workflows across GCP and on-prem environments.
- Ingest and process cybersecurity-relevant data sources such as CVE feeds, software inventories, vulnerability databases, and event logs.
- Develop and maintain transformation logic and data models linking vulnerabilities (CVEs) to affected software and assets.
- Implement and automate data validation, consistency checks, and quality assurance using tools like Great Expectations or Deequ.
- Collaborate with AI and graph modeling teams to structure and prepare data for threat intelligence and risk quantification models.
- Manage and optimize data storage using BigQuery, PostgreSQL, and Cloud Storage, ensuring scalability and performance.
- Automate data workflows and testing through CI/CD pipelines (GitHub Actions, GCP Cloud Build, Jenkins).
- Implement monitoring and observability for pipelines using Prometheus, Grafana, and OpenTelemetry.
- Apply a security-focused mindset in data handling, ensuring safe ingestion, processing, and access control of sensitive datasets.
Requirements
3+ years of experience in data engineering, backend data systems, or cybersecurity data processing.
- Strong Python skills and experience with pandas, PySpark, or Dask for large-scale data manipulation.
- Proven experience with data orchestration and transformation frameworks (Airflow, dbt, or Dagster).
- Solid understanding of data modeling, data warehousing, and SQL optimization and ETL pipelines (Kafka).
- Familiarity with CVE data structures, vulnerability databases (e.g. NVD, CPE, CWE), or security telemetry.
- Experience integrating heterogeneous data sources (APIs, CSV, JSON, XML, or event streams).
- Knowledge of GCP data tools (BigQuery, Pub/Sub, Dataflow, Cloud Functions) or equivalent in Azure/AWS.
- Experience with containerized environments (Docker, Kubernetes) and infrastructure automation (Terraform or Pulumi).
- Understanding of data testing, validation, and observability practices in production pipelines.
- A structured and security-aware approach to building data products that support AI-driven risk analysis.
Nice to Have
- Experience working with graph databases (Neo4j, ArangoDB) or ontology-based data modeling.
- Familiarity with ML pipelines (Vertex AI Pipelines, MLflow, or Kubeflow).
- Understanding of software composition analysis (SCA) or vulnerability scanning outputs (e.g. Trivy, Syft).
- Background in threat intelligence, risk scoring, or cyber risk quantification.
- Experience in multi-cloud or hybrid setups (GCP, Azure, on-prem).
Benefits & conditions
- Freedom to design and shape a modern, secure data platform from the ground up.
- A collaborative startup environment where your work directly supports AI and cybersecurity products.
- Flexible working hours and remote-friendly setup.
- Exposure to cutting-edge technologies in AI, data engineering, and cyber risk analytics.
- Competitive salary and benefits tailored to your experience.
About the company
At Cyber Insight, we are building the next generation of AI-driven platforms for IT security and risk management. Our mission is to empower companies to gain deep insights into their IT landscapes and proactively mitigate risks in an increasingly complex digital world.
As a fast-growing startup, we combine expertise in cybersecurity, data engineering, and artificial intelligence to deliver solutions that automate risk assessments, predict potential threats, and help organizations stay ahead of evolving cyber risks. Our team thrives on innovation, collaboration, and a shared passion for making a real impact in the cybersecurity space.