Junior Data Engineer (all genders)
Role details
Job location
Tech stack
Job description
We are looking for a curious and driven Junior Data Engineer to join our BI team. In this role, you will design and maintain data pipelines on AWS, support our analytical infrastructure, and contribute directly to making Cardmarket's data systems faster, more reliable, and more cost-efficient. This is not a passive reporting role. You will actively shape how data flows through our cloud architecture - with a strong emphasis on AWS S3 as our data lake, and pipeline engineering as your primary focus.
Tech Stack You will work with the following technologies on a daily basis:
- AWS S3
- AWS Glue / Lambda
- Python
- SQL
- Microsoft Fabric
- Power BI
- GitLab
- Jira
Your Responsibilities
Cloud & Pipeline Engineering (AWS - Core Focus)
- Design, build, and maintain scalable data pipelines on AWS, with a strong emphasis on optimizing both computational performance and cloud cost efficiency.
- Work extensively with AWS S3 as the primary data lake: define partitioning strategies, select appropriate file formats (Parquet, Delta, JSON), and enforce data organization best practices.
- Leverage AWS-native services (Glue, Lambda, Step Functions, Athena, CloudWatch) to automate ingestion, transformation, and delivery workflows.
- Monitor pipeline performance, identify bottlenecks, and implement architectural improvements that reduce processing time and AWS spend.
- Apply cost-optimization principles: right-sizing compute resources, minimizing redundant data reads, leveraging S3 storage tiers, and avoiding unnecessary full-table scans.
- Implement automated data quality checks, validation rules, and alerting to ensure pipeline reliability and data consistency.
Data Engineering & BI Support
- Extract, transform, and load (ETL) data from multiple sources into our BI systems using SQL and Python.
- Clean and prepare raw data for analysis, ensuring quality, consistency, and readiness for downstream consumption.
- Build and maintain datasets and semantic models in Microsoft Fabric to support reporting layers.
- Develop and maintain dashboards and reports in Power BI for stakeholders across departments.
- Monitor key marketplace KPIs and surface proactive insights on trends, anomalies, and opportunities.
Collaboration & Process
- Collaborate with Marketing, Product, and Operations teams to understand data requirements and deliver actionable solutions.
- Maintain clean, well-documented, version-controlled code in GitLab following team standards.
- Manage tasks and priorities transparently via Jira within agile workflows.
- Conduct ad hoc analyses and provide data-driven recommendations to support business decision-making.
- Stay up to date with AWS services, data engineering patterns, and BI trends - and bring learnings back to the team.
Requirements
Do you have experience in SQL?, Do you have a Bachelor's degree?, * Bachelor's degree in Computer Science, Data Science, Engineering, Statistics, or a related field.
- Solid SQL skills - including complex joins, aggregations, window functions, subqueries, and query optimization.
- Hands-on experience or strong foundational knowledge of AWS S3: partitioning, file formats, access patterns, and lifecycle policies.
- Familiarity with ETL concepts: data pipeline architecture, transformation layers, batch processing, and data quality practices.
- Basic Python proficiency for data manipulation and pipeline automation (pandas, boto3, or similar).
- Understanding of cloud resource and cost optimization principles - e.g., columnar formats, compression strategies, compute right-sizing.
- Experience with Git / GitLab for version control and collaborative development.
- Strong analytical mindset with the ability to detect and investigate data anomalies.
- Excellent written and verbal communication in English; able to present technical findings to non-technical audiences.
- Fluent English skills
- Presence in Berlin is required
- Valid Work Permit for Germany is preferable
Nice to Have
- AWS Services: Hands-on experience with AWS Glue, Lambda, Step Functions, Athena, or CloudWatch for data workflows.
- Microsoft Fabric: Familiarity with Microsoft Fabric, Lakehouses, Dataflows Gen2, or Synapse Analytics.
- Power BI / DAX: Experience building reports and dashboards in Power BI, including basic DAX measures and calculated columns.
- Pipeline Optimization: Documented experience reducing pipeline runtime or cloud costs through architectural decisions.
- Marketplace Analytics: Familiarity with e-commerce or marketplace KPIs: conversion rates, GMV, seller performance, user acquisition.
- Agile: Practical experience working with Jira in a team or project setting.