Data Engineer
Role details
Job location
Tech stack
Job description
-
Design, build, and maintain scalable data pipelines using Databricks, PySpark, and Azure Data Factory.
-
Develop and optimize data ingestion, transformation, and loading processes for structured and unstructured data sources.
-
Implement data governance frameworks to ensure proper data ownership, compliance, and security standards.
-
Build and maintain reliable data quality validation processes to ensure accuracy, consistency, and completeness of enterprise data.
-
Establish data lineage tracking mechanisms to provide transparency on how data flows across systems and pipelines.
-
Develop and maintain Python scripts and automation tools to streamline data engineering workflows.
-
Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality datasets for analytics and AI initiatives., * Collaborate effectively with cross-functional teams including data scientists, analysts, and platform engineers.
-
Troubleshoot and optimize data pipeline performance, reliability, and cost efficiency in Azure environments.
What We Bring to the Table:
-
Opportunity to work on large-scale cloud data platforms and advanced data engineering solutions.
-
Exposure to modern data stack technologies including Databricks, Azure Data Factory, and distributed processing frameworks.
-
Collaborative environment working with data scientists, analytics teams, and engineering professionals.
-
Challenging projects focused on building scalable and enterprise-grade data platforms.
-
Opportunities to enhance expertise in data governance, data quality, and lineage frameworks
Requirements
-
8+ years of experience in data engineering, data platform development, or big data environments.
-
Strong hands-on experience with Databricks and PySpark for distributed data processing.
-
Practical experience working with Azure Data Factory for orchestration and data pipeline development.
-
Advanced Python scripting skills for automation, transformation, and integration tasks.
-
Solid understanding of data governance principles, including metadata management, data cataloging, and compliance.
-
Experience implementing data quality frameworks and validation rules within data pipelines.
-
Hands-on experience with data lineage tracking and documentation practices.
-
Experience working with large-scale data processing frameworks and cloud-based data platforms.
You should possess the ability to:
-
Architect and implement end-to-end data pipelines that support large-scale data processing.
-
Design efficient data models and transformation logic using PySpark and Databricks.
-
Implement data governance and compliance practices within enterprise data ecosystems.
-
Identify and resolve data quality issues proactively through automated validation and monitoring.
-
Track and document data lineage across multiple data sources and transformation layers.
-
Write clean, scalable, and maintainable Python code for data engineering workflows.