Senior Data Engineer and Architect
Role details
Job location
Tech stack
Job description
Mentorship Pipeline Repair dbt Standardization Spark AWS Exostystem SQL Python Refactoring CI/CD, We are looking for a hands-on engineering heavy-hitter to join our Customer Data & Intelligence (CDI) function immediately. We have a rich dataset covering multiple global markets, but our legacy codebase (monolithic SQL and Python scripts) is fragile.
Your immediate mission is to triage, refactor, and stabilize our critical data pipelines. You will take "God Queries" and break them down into modular, testable, and performant dbt models.
Immediate Deliverables (First 30-60 Days):
- The "Code Rescue": Audit and patch critical queries currently causing data corruptions. Fix logic errors.
- Modularization Pilot: Implement dbt (Data Build Tool) within our AWS/Databricks environment. Migrate the most critical reporting tables from stored procedures/scripts into dbt models.
- Automated Quality Gates: Deploy automated tests (using dbt tests or Great Expectations) to check for identity uniqueness and any data errors on critical columns. Stop bad data before it hits the dashboard.
What you will do:
- Refactoring: Rewrite inefficient legacy SQL to improve performance and readability.
- Pipeline Repair: Fix error handling in existing AWS Glue/PySpark jobs.
- Standardization: Establish the "Gold Standard" for what good code looks like. Create the Pull Request template and SQL linting rules that the rest of the team must follow.
- Mentorship: Act as the "Bar Raiser" in code reviews, establishing standards and teaching the existing team how to write modular, defensive code., * You hate "Toil." You refuse to check data manually; you write scripts to check it for you.
- You are not afraid of legacy codes. You see a messy codebase as a puzzle to be solved, not a reason to run away.
- You care about Truth. You understand that "mostly correct" data is useless to a business.
Requirements
Do you have experience in Spark?, * dbt (Data Build Tool): Proven experience setting up dbt from scratch. You know how to structure a project (Staging -> Intermediate -> Marts).
- Python & Spark: Ability to read and fix PySpark syntax errors and optimize Spark execution plans (Databricks/AWS Glue).
- AWS Ecosystem: Comfortable with S3, Athena, and IAM permissions.
- CI/CD: Experience setting up and running tests automatically on commit.