Data Engineer (NL26)
Role details
Job location
Tech stack
Job description
BBD is looking for a skilled Data Engineer to design, build and maintain scalable data pipelines and architectures. You will play a pivotal role in enabling data-driven decision-making by ensuring our data infrastructure is robust, secure and efficient. You will work with modern tools and cloud platforms (AWS, Azure, Databricks) to transform raw data into actionable insights, supporting both traditional analytics and emerging AI/ML workloads. Responsibilities
- Pipeline development: Design, build and maintain efficient, reliable and scalable ETL/ELT pipelines using Python, SQL, and Spark
- Architecture & modelling: Implement modern data architectures (e.g., Data Lakehouse, Medallion Architecture) and data models to support business reporting and advanced analytics
- Cloud infrastructure: Manage and optimise cloud-based data infrastructure on AWS and Azure, ensuring cost-effectiveness and performance
- Data governance: Implement data governance, security and quality standards (e.g., using Great Expectations, Unity Catalog) to ensure data integrity and compliance
- Collaboration: Work closely with Data Scientists, AI Engineers and Business Analysts to understand data requirements and deliver high-quality datasets
- MLOps support: Collaborate on MLOps practices, supporting model deployment and monitoring through robust data foundations
- Continuous improvement: Monitor pipeline performance, troubleshoot issues, and drive automation using CI/CD practices
Requirements
- A minimum of 5 years of professional experience, with at least 2 years of experience with Databricks
- Programming & scripting:Strong proficiency inPythonfor data manipulation and scripting. Experience with Scala or Java is a plus
- Big Data processing:Extensive experience withApache Spark (PySpark)for batch and streaming data processing
- Workflow orchestration:Proficiency withApache Airflowor similar tools (e.g., Prefect, Dagster, Azure Data Factory) for scheduling and managing complex workflows
- Data warehousing: Proficiency in modern cloud data warehouses such as Snowflake, including designing, modelling and optimising analytical data structures to support reporting, BI and downstream analytics
- Expert SQL skills for analysis and transformation
- Deep understanding ofBig Data file formats(Parquet, Avro, Delta Lake)
- Experience designingData Lakesand implementing patterns like theMedallion Architecture(Bronze/Silver/Gold layers)
- Streaming:Experience with real-time data processing usingKafkaor similar streaming platforms
- DevOps & CI/CD:
- Proficiency withGitfor version control
- Experience implementing CI/CD pipelines for data infrastructure (e.g., GitHub Actions, GitLab CI, Azure DevOps)
- Familiarity with data quality frameworks likeGreat Expectationsor Soda
- Understanding of data governance principles, security, and lineage
- Reporting & visualisation:Experience serving data to BI tools likePower BI,Tableau, or Looker
- AI/ML familiarity:Exposure to Generative AI concepts (LLMs, RAG, Vector Search) and how data engineering supports them
- Storage:Deep knowledge ofAmazon S3for data lake storage, including lifecycle policies and security configurations
- ETL & orchestration:Hands-on experience withAWS Glue(Crawlers, Jobs, Workflows, Data Catalog) for serverless data integration
- Governance:Experience withAWS Lake Formationfor centrally managing security and access controls
- Streaming:Proficiency withAmazon Kinesis(Data Streams, Firehose) for collecting and processing real-time data
- Core services:Solid understanding of core AWS services (IAM, Lambda, EC2, CloudWatch) relevant to data engineering
Other
- Storage: Deep knowledge of Azure Data Lake Storage (ADLS) Gen2 and Blob Storage
- ETL & orchestration: Experience with Azure Data Factory (ADF) or Azure Synapse Analytics pipelines for data integration and orchestration
- Governance: Familiarity with Microsoft Purview for unified data governance and Microsoft Entra ID (formerly Azure AD) for access management
- Streaming: Proficiency with Azure Event Hubs or Azure Stream Analytics for real-time data ingestion
- Core Services: Understanding of core Azure services (Resource Groups, VNets, Azure Monitor) relevant to data solutions
- Platform management: Experience managing Databricks Workspaces, clusters, and compute resources
- Governance: Proficiency with Unity Catalog for centralised access control, auditing, and data lineage
- Development:
- Building and orchestrating Databricks Jobs and Delta Live Tables (DLT) pipelines
- Deep knowledge of Delta Lake features (time travel, schema enforcement, optimisation)
- AI & ML integration:
- Experience with MLflow for experiment tracking and model registry
- Exposure to Mosaic AI features (Model Serving, Vector Search, AI Gateway) and managing LLM workloads on Databricks
Required certifications
- AWS: AWS Certified Solutions Architect - Associate
- Microsoft: Microsoft Certified: Azure Solutions Architect Expert
- Databricks: (details omitted)
Internal candidate profile
We are open to training internal candidates who demonstrate strong engineering fundamentals and a passion for data. Ideal internal candidates might currently be in the following roles:
- Python Back-end Engineer: Strong coding skills (Python) and experience with APIs / back-end systems, looking to specialise in big data processing and distributed systems
- DevOps Engineer: Coding background with strong infrastructure-as-code and CI/CD skills, interested in applying those practices specifically to data pipelines and MLOps