Enterprise Data Platform (EDP) & Customer Data Platform (CDP)
Role details
Job location
Tech stack
Job description
-
Define and enforce Client's Lakehouse architecture standards using Azure Databricks, aligned to scalability, security, and cost efficiency.
-
Implement and operationalize the Medallion Architecture (Bronze / Silver / Gold) as the enterprise standard:
-
Bronze - raw, immutable, audit-ready ingestion
-
Silver - cleansed, conformed, validated, and privacy-compliant datasets
-
Gold - curated, analytics-ready, semantic-aligned data products (including standardized current and history tables where required, e.g., composite + composite_hist / SCD Type 2)
Establish reference architectures, design patterns, and guardrails that enable consistent adoption across Stores, Digital, Marketing, Supply Chain, and Corporate domains.
Standardize on Unity Catalog for all new and migrated workloads (minimize/retire Hive Metastore usage), including consistent catalog/schema conventions, data ownership, and access controls., * Build declarative data pipelines using Databricks Lakeflow and Delta Live Tables (DLT) as the preferred enterprise pattern.
-
Define data quality expectations, freshness SLAs, and validation rules directly within pipelines.
-
Leverage DLT capabilities for:
-
automated dependency management
-
data quality enforcement
-
lineage and observability
-
operational simplicity at scale, * Apache Kafka for event-driven and streaming ingestion (pub/sub, CDC fanout, operational events)
-
Databricks Auto Loader for scalable, incremental file ingestion from cloud object storage with schema inference/evolution
-
Lakeflow Connectors for managed ingestion from SaaS applications and databases (connector-based patterns with governed landing into Bronze)
-
Databricks Structured Streaming (and streaming tables where applicable) for continuous ingestion and low-latency processing into Delta
Design resilient ingestion frameworks that support high-volume customer, marketing, and operational data with schema evolution and fault tolerance.
Apply consistent ingestion controls across internal systems and external vendors., * Canonical data models for shared enterprise entities (customer, store, product, transaction, vendor)
- Dimensional modeling (Star / Snowflake schemas) for analytics and reporting
- Semantic modeling aligned to downstream BI and analytics tools
Ensure conformed dimensions and consistent metric definitions across domains.
Partner with analytics and business teams to validate business meaning and usability., * MicroStrategy (MSTR)
- Power BI
Leverage Unity Catalog as the system of record for:
- data governance and access control
- lineage and discovery
- semantic consistency and certification
- data classification via Unity Catalog tags (e.g., PII/sensitivity, domain, certification) to drive masking, policy enforcement, and controlled publishing
Promote enterprise metric definitions via governed semantic models (e.g., Unity Catalog metrics/semantic layer where adopted) to ensure consistent KPIs across MSTR, Power BI, and downstream consumers.
Ensure Gold-layer datasets are optimized, documented, and certified for enterprise consumption., * Orchestrate end-to-end pipelines using Azure Data Factory (ADF) and/or Apache Airflow, integrated with Databricks Workflows.
- Define dependency management, retry patterns, alerting, and operational ownership for production workloads., Establish PII protection as a non-negotiable enterprise standard, including mandatory 3-layer encryption:
- In-Transit Encryption
- TLS-based secure transport for all internal and external transfers.
- File-Level Encryption at Rest
- Encrypted files and objects for vendor, marketing, and partner exchanges.
- Record-Level / Element-Level Encryption & Hashing
- Attribute-level protection for PII used in CDP, marketing, and analytics workflows.
- Enforce protections using Unity Catalog controls where applicable (e.g., masking policies and fine-grained access controls) to ensure governed use across analytics and activation.
Ensure full auditability, regulatory compliance (GDPR, CCPA), and consistent enforcement across platforms and vendors., * Design and operate secure, high-volume data exchanges with advertising, marketing, and data partners.
- Validate keys, credentials, service accounts, and secure repositories (SFTP, cloud object storage).
- Provide technical direction to vendors to ensure compliant, end-to-end delivery under tight timelines., * Contribute to production support, incident analysis, and continuous platform improvements.
- Implement production operational standards using the enterprise toolchain (e.g., New Relic monitoring, PagerDuty incident response/on-call, ServiceNow ticketing), including alerting, runbooks, and SLAs., * Mentor engineers and lead architecture reviews across platform, analytics, and marketing teams.
- Drive adoption of enterprise patterns through documentation, reviews, and enablement sessions.
Requirements
- 8+ years in data engineering, solution architecture, or platform engineering.
- Deep experience with Azure Databricks, Spark, Delta Lake, Lakeflow, Delta Live Tables, Python/PySpark, SQL.
- Experience with Kafka, Auto Loader / Auto Streaming, ADF, and/or Airflow.
- Strong experience in enterprise data modeling, governance, and BI enablement.
- Proven delivery of secure, compliant, enterprise-scale data platforms.