Senior Data Engineer
Role details
Job location
Tech stack
Job description
Your day-to-day will include designing ELT/ETL processes on BigQuery and ClickHouse, building real-time pipelines on Pub/Sub and Kafka with Dataflow (and where it fits, Flink/Spark), orchestrating workflows with Airflow, and ensuring data is properly cleaned, modelled, and served for analytics, ML training, and online inference. You'll partner with ML engineers on feature pipelines, monitoring data drift, and keeping models well-fed and retrained as needed. You'll consume and build REST APIs, integrate with third-party SaaS sources, and treat infrastructure as code., You will be part of the Data, Analytics & AI team, collaborating closely with Infrastructure, Software Engineering, Product, and ML/AI engineers. We're in the middle of a GCP-native modernisation - migrating away from Snowflake toward BigQuery, Bigtable, Pub/Sub, and Dataflow - so we're looking for someone who's opinionated about clean architecture, allergic to over-engineering, and comfortable owning systems end-to-end. If retiring a legacy warehouse and standing up its replacement sounds like a good time, you'll fit right in.
-
? Technology tools & weapons you'll be using:
-
Cloud & warehouse: GCP, BigQuery, Bigtable, Cloud Storage
-
Streaming & messaging: Pub/Sub, Kafka
-
Processing: Dataflow (Apache Beam), with Flink/Spark where appropriate
-
Orchestration: Airflow (Cloud Composer)
-
Analytical store: ClickHouse
-
Languages: Python, SQL
-
Modelling & quality: dbt, data quality gates
-
Containers & CI/CD: Docker, Kubernetes, GitHub Actions / equivalent
-
Legacy (being retired): Snowflake
The adventures that await you after becoming Senior Data Engineer at Hack The Box:
-
Design and build batch and streaming pipelines on Dataflow, Pub/Sub, and Kafka feeding BigQuery, Bigtable, and ClickHouse
-
Help drive the migration off Snowflake onto our GCP-native stack - and retire shadow pipelines along the way
-
Own the orchestration layer in Airflow, including SLAs, retries, and data quality gates
-
Model data for analytics and for ML - including feature pipelines that serve both training and low-latency online inference
-
Partner with ML engineers on feature stores, drift monitoring, and retraining workflows
-
Capture requirements from stakeholders and translate them into pragmatic, well-scoped data products
-
Continuously improve data quality, reliability, observability, and cost efficiency
-
Identify new data sources worth acquiring and integrate them cleanly, * You'll have the exhilarating opportunity to contribute to a product that is highly appreciated by users and the cybersecurity community at large
-
You'll experience a highly supportive and caring environment, fostering growth, flexibility, and autonomy
-
You'll embark on an exciting journey of continuous learning and problem-solving, leveling up as our organization grows
-
Most importantly, you'll have a blast at HTB because fun is an essential ingredient in our recipe for success! Just wait until you see our global meet-ups!, Our benefits package is designed to provide strong support to our team, but it may vary depending on location and type of employment (e.g., UK, Greece, or engagement through an Employer of Record). ? The Quest of Becoming Hack The Box's Senior Data Engineer:
-
Level 1: To complete level one's objective, submit your application.
-
Level 2: Meet the Talent Acquisition team. Level's objective: highlight your past achievements, ambitions, and values.
-
Level 3: Meet the hiring team. Level's objective: connect with the hiring team and share with them your achievements.
-
Level 4: Complete 2 assignments that align with day-to-day job-related tasks and responsibilities. Part of the assignment is discussing it with the hiring team in a debriefing session, in order to walk the team through your thinking process.
-
Level 5: Congratulations! Not many reach this level . Level's objective: have a constructive, final conversation with senior leadership to explore the role and your future at HTB.
-
Level 6: You've officially received an offer from HTB! To complete the last level and the Quest, all you need to do is accept the offer. Quest complete. Congratulations, you're officially one of us Your next quest: complete the onboarding.
Requirements
- Strong data modelling and warehouse architecture skills (dimensional modelling, event-driven, lakehouse patterns)
- Hands-on experience with GCP data services - BigQuery is a must; Pub/Sub, Dataflow, Bigtable, Cloud Composer are strong pluses
- Production experience with streaming pipelines on Dataflow/Beam, Flink, or Spark Structured Streaming, ingesting from Kafka and/or Pub/Sub
- Solid SQL and strong Python - you write production-quality code, not just notebooks
- Experience with ClickHouse or another columnar OLAP engine in production
- Workflow orchestration experience with Airflow (or Prefect/Dagster)
- Comfortable with dbt or equivalent transformation frameworks
- Experience migrating off legacy warehouses (Snowflake, Redshift, Synapse) onto cloud-native stacks is a plus
- Working knowledge of ML in production - feature engineering, feature stores, model deployment, drift monitoring, retraining
- Docker & Kubernetes experience
- CI/CD mindset, infrastructure-as-code sensibility, and a bias for simple, observable systems
- Bonus: CDC tooling (Datastream, Debezium), Vertex AI / Feature Store
Benefits & conditions
- Private health care
- Paid paternity leave
- 25 annual leave days
- Free lunch & snacks at the office
- 120€ Ticket Restaurant by Edenred
- Dedicated budget for training and professional development, participation in conferences
- Full access to the Hack The Box lab offerings; so you can learn how to hack
- State-of-the-art equipment (mac, iPhone, and mobile plan)
- Flexible WFH (Hybrid Model) - Fully Remote is also an option if you're not an Attica resident