Data Engineer
Role details
Job location
Tech stack
Job description
As a Data Engineer at Lunio you will play a critical role in building, optimising, and maintaining the robust data infrastructure and scalable data pipelines that power machine learning, MLOps, analytics, and product intelligence at Lunio. Responsible for building reliable, scalable, and observable production-grade data systems with clear SLAs, data freshness guarantees, and monitoring standards.
You will design and operate batch and streaming pipelines that produce reliable, ML-ready datasets for product, analytics, and inference use cases. Owning end-to-end data pipeline development, the design of clean and well-governed data assets that accelerate ML development, and the reliability, observability, and cost-efficiency of the platform in production., * Data Pipeline Engineering: Own the development and optimisation of data pipelines (batch and streaming) that reliably ingest and transform high-volume clickstream and external data.
- Data Warehouse Ownership: Own the implementation and performance optimisation of our multi-node, TB-scale Redshift data warehouse, including data modelling, storage design, and cost-efficient query performance.
- ML-Ready Data Asset Delivery: Own the delivery of curated, versioned, ML-ready data assets, ensuring consistency, usability, and alignment with downstream use cases.
- Reliability & Observability: Own pipeline reliability by defining SLAs, ensuring data freshness, and implementing monitoring, data quality validation, observability, alerting, and structured incident response processes.
- ML Pipelines: Implement and operationalise feature computation workflows, including the development and maintenance of feature store infrastructure, to support model training and inference in collaboration with Data Science and Cloud/Platform teams.
Requirements
- Proven track record of building, and operating production-grade data pipelines and data warehouse solutions in high-scale environments.
- Proven ownership of pipeline reliability, SLAs, monitoring, and incident debugging in production systems.
- Deep proficiency in Python and SQL, working with large-scale event data.
- Strong hands-on experience with AWS data infrastructure (S3, Redshift, Glue, Kinesis, Lambda, etc.), including performance and cost optimisation.
- Experience building streaming/event-driven data pipelines (e.g., Kinesis or similar technologies).
You'll really thrive in this role if you have:
- Experience in high-volume event data environments (adtech, fraud, cybersecurity).
- Experience collaborating with infrastructure or platform teams and working with infrastructure-as-code tools (e.g., Terraform).
- Familiarity with operating in security- and compliance-aware environments (e.g., SOC2, ISO, IAM best practices, data classification).