Senior Engineer - Data Engineering (Data and Records Management Engineer)
Role details
Job location
Tech stack
Job description
Leads the automation and optimization of metadata management and records management processes within the Data & AI Governance Operations team.
Responsible for scaling the discovery and classification of data assets across enterprise platforms-with a primary focus on GCP and BigQuery environments-while ensuring the organization meets its records retention obligations through systematic archival and deletion workflows. Reduces the manual burden on the Data Analyst by delivering automated, high-quality metadata at scale, and provides the Data & AI Risk Analyst with the structured compliance evidence needed to assess records management and data lifecycle controls.
Duties and Responsibilities:
-
Automate Metadata Discovery and Classification: Design, build, and maintain automated pipelines for discovering and classifying data assets across cloud and on-premises environments, with a primary focus on GCP BigQuery. Leverage tools such as Dataplex and BigID to scale metadata population and sensitive data classification, feeding clean, structured outputs into the enterprise Data Catalog to reduce manual curation effort.
-
Enable Records Retention Compliance: Implement and operationalize the organization's records retention schedule by building workflows that enforce automated archival, defensible deletion, and disposition of data assets in alignment with internal policy and applicable regulatory requirements. Maintain audit trails and documentation to support compliance validation.
-
Manage O365 Records Management Enablement: Configure and maintain records management capabilities within Microsoft 365, including retention labels, retention policies, and disposition review workflows, ensuring enterprise content is governed consistently with the records retention schedule across email, SharePoint, and Teams environments.
-
Integrate with Data Catalog Workflows: Partner closely with the Data Analyst to translate automated classification and discovery outputs into well-structured catalog entries, minimizing manual entry while improving metadata coverage, accuracy, and freshness across data domains.
-
Build and Maintain Governance Pipelines: Develop reusable, monitored, and documented data pipelines and automation scripts supporting governance operations. Own the reliability, versioning, change management, and performance of these pipelines to ensure governance operations are resilient and scalable.
-
Support Risk and Compliance Reporting: Provide the Data & AI Risk Analyst with structured data outputs and metrics on records management coverage, classification completeness, and retention compliance. Ensure that pipeline outputs are formatted to feed directly into governance dashboards and risk assessments, enabling measurable demonstration of control effectiveness.
-
Evaluate and Evolve Governance Tooling: Stay current with the evolving landscape of data governance and privacy engineering tools. Evaluate, pilot, and operationalize new capabilities-such as enhanced sensitive data detection, automated lineage, or retention enforcement features-that improve the efficiency and maturity of governance operations.
Requirements
-
Education Required: Bachelor's degree from an accredited institution in Computer Science, Data Engineering, Information Systems, or a related technical field.
-
Experience Required: Three (3) or more years of experience in data engineering, data governance engineering, or a related technical discipline. Hands-on experience with Google Cloud Platform (GCP), with demonstrated proficiency in BigQuery and its associated metadata features.
-
Demonstrated experience building and maintaining automated data pipelines or governance automation workflows.
-
Experience with records management concepts and technologies, including implementing retention schedules in enterprise environments.
Technical Skills and Abilities:
-
GCP & BigQuery Expertise: Solid working knowledge of BigQuery including schema management, table-level metadata, and integration with GCP data governance services. Familiarity with Dataplex for metadata tagging, data quality, and governance is highly preferred.
-
Sensitive Data Discovery and Classification Tools: Practical experience using BigID or comparable tools (e.g., Informatica DSPM, Microsoft Purview) for automated PII and sensitive data discovery and classification at scale.
-
O365 Records Management (Preferred): Familiarity with Microsoft Purview Compliance Center, including configuring retention labels, policies, and disposition workflows for enterprise content governance.
-
Pipeline Development: Proficiency in Python and SQL, with experience in orchestration tools such as Cloud Composer, Apache Airflow, or equivalent, for building and scheduling governance pipelines.
-
Data Lifecycle Management: Sound understanding of data lifecycle principles-creation, use, retention, archival, and deletion-and how they map to regulatory obligations and organizational records schedules.
-
Security and Privacy Fundamentals: Working knowledge of data classification schemas, PII handling requirements, and privacy engineering concepts relevant to records and metadata management.
Benefits & conditions
- US dollar-linked compensation
- Performance-based annual bonus
- Recognition and rewards programs
- Agile Benefits - special allowances for Health, Wellness & Academic purposes
- Paid birthday leave
- Team engagement allowance
- Comprehensive health & life insurance cover (extendable to parents and in-laws)
- Overseas travel opportunities and client environment exposure
- Hybrid work arrangement