Senior Data Engineer
Role details
Job location
Tech stack
Job description
We are seeking a highly experienced and hands-on Senior Data Engineer to join our Data Engineering teams. You will play a key role in supplementing existing capacity, upgrading our data architecture, and ensuring the highest quality, performance, and cost-efficiency of our data platforms. The work is focused on critical deliverables for personal investment, personal wealth, and comprehensive data analytics, while preparing the platform for a larger strategic move in the future., * Design, build, and maintain high-performance ETL/ELT data pipelines using Python and PySpark.
- Apply expert-level coding skills to develop and manage data processing jobs leveraging PySpark for distributed computing across large-scale datasets.
- Take full ownership of the data workflow, including getting data from multiple sources, scrubbing, and validating data to ensure the highest quality.
- Write and optimize complex, performant SQL queries for data extraction, integrity checks, and performance tuning.
- Contribute to platform modernization by exploring and increasing the adoption of AI/ML, including using tools like Copilot and Claude for acceleration, and building models to fill data gaps or improve systems.
- Collaborate with data architects by proposing ideas and great questions, taking ownership as the expert on data, pipelines, and systems.
- Implement DevOps practices for the automated deployment and orchestration of Python applications and data pipelines (e.g., using Docker, Jenkins, Terraform).
- Hands on experience with SQL and complex performance tuning.
Requirements
Programming: Expert-level proficiency in Python, including libraries like Pandas and NumPy.
Designing: Designing data pipelines for the data coming from multiple sources
Data Processing: Solid hands-on experience with PySpark for building scalable data workflows
Data Querying: Expert-level knowledge of writing complex SQL queries (Oracle or Snowflake), with proven ability to perform performance tuning on large datasets and complex database code.
Cloud Platform: Robust experience with AWS cloud services and associated data services, specifically:
AWS Glue (ETL)
S3
Lambda
Redshift
DynamoDB, Athena, ECS, EventBridge, OpenSearch, RDS
ETL & Data Management: Robust proficiency in ETL/ELT methodologies and tools, as well as Data Quality, Data Validation, and Anomaly Detection techniques.
Scripting: Working experience with scripting and automation using Unix and Python.