Data Engineer Collibra | Insurance Domain
Role details
Job location
Tech stack
Job description
Design and implement scalable data pipelines using Azure Data Factory, Databricks, Synapse Analytics, and Azure Data Lake. Develop and optimize data transformation workflows using Python, R, or Scala on Azure Databricks or Apache Spark. Integrate and manage metadata using Collibra for effective data governance and cataloging. Handle structured, semi-structured, and unstructured data to extract insights and identify linkages across datasets. Lead technical delivery and mentor junior engineers on data engineering best practices. Optimize Spark jobs and debug performance issues using tools like Ganglia UI. Design efficient data structures for storage and querying, including formats like Parquet and Delta Lake. Work across multiple database technologies: RDBMS (MS SQL Server, Oracle), MPP (Teradata, Netezza), and NoSQL (MongoDB, Cassandra, Neo4J, CosmosDB, Gremlin). Ensure secure and compliant data handling aligned with Information Security principles. Collaborate in Agile teams and use Git-based workflows for version control and code management.
Requirements
Minimum 7 years of hands-on experience in Azure Data Engineering. Strong working knowledge of Collibra for data governance and metadata management. Proven experience in the insurance domain is highly desirable. Proficient in Python, R, or Scala for data transformation and analysis. Deep understanding of NoSQL databases and distributed data processing. Experience with traditional ETL tools such as Informatica, IBM Datastage, or Microsoft SSIS. Skilled in working with large and complex codebases using GitHub and Gitflow. Effective communicator with strong stakeholder management capabilities. Familiarity with Agile methodologies including SCRUM, XP, and Kanban. Preferred certifications: Microsoft Certified Azure Data Engineer Associate and Collibra Certified Ranger (or equivalent).