Data Engineer
Role details
Job location
Tech stack
Job description
This is a remote full-time consulting position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics)., + Architect, design, develop, test and maintain high-performance, large-scale, complex data architectures, which support data integration (batch and real-time, ETL and ELT patterns from heterogeneous data systems: APIs and platforms), storage (data lakes, warehouses, data lake houses, etc), processing, orchestration and infrastructure. Ensuring the scalability, reliability, and performance of data systems, focusing on Databricks and Azure.
-
Contribute to detailed design, architectural discussions, and customer requirements sessions.
-
Actively participate in the design, development, and testing of big data products..
-
Construct and fine-tune Apache Spark jobs and clusters within the Databricks platform.
-
Migrate out of Azure Synapse to Azure Data Lake or other technologies.
-
Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive).
-
Design and implement data models and schemas that support efficient data processing and analytics.
-
Design and develop clear, maintainable code with automated testing using Pytest, unittest, integration tests, performance tests, regression tests, etc.
-
Collaborating with cross-functional teams and Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components meeting product deliverables.
-
Evaluating and implementing new technologies and tools to improve data integration, data processing, storage and analysis.
-
Evaluate, design, implement and maintain data governance solutions: cataloging, lineage, data quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patterns.
-
Continuously monitor and fine-tune workloads and clusters to achieve optimal performance.
-
Provide guidance and mentorship to junior team members, sharing knowledge and best practices.
-
Maintain clear and comprehensive documentation of the solutions, configurations, and best practices implemented.
-
Promote and enforce best practices in data engineering, data governance, and data quality.
-
Ensure data quality and accuracy.
-
Design, Implement and maintain data security and privacy measures.
-
Be an active member of an Agile team, participating in all ceremonies and continuous improvement activities, being able to work independently as well as collaboratively.
Requirements
We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of Data products , including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.
Qualification & Experience
-
Must have a full-time Bachelor's degree in Computer Science or similar
-
At least 3 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers.
-
3+ years of experience with Azure DevOps, GitHub.
-
Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer, including migrations.
-
Following certifications:
-
Databricks Certified Associate Developer for Apache Spark
-
Databricks Certified Data Engineer Associate
-
Microsoft Certified: Azure Fundamentals
-
Microsoft Certified: Azure Data Engineer Associate
-
Microsoft Exam: Designing and Implementing Microsoft DevOps Solutions (nice to have)
Required skills/Competencies
-
Strong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, migration, storage, processing and manipulation.
-
Strong understanding and experience with SQL and writing advanced SQL queries.
-
Thorough understanding of big data principles, techniques, and best practices.
-
Strong experience with scalable and distributed Data Processing Technologies such as Spark/ PySpark (must have: experience with Azure Databricks ), DBT and Kafka, to be able to handle large volumes of data.
-
Solid Databricks development experience with significant Python, PySpark, Spark SQL, Pandas, NumPy in Azure environment.
-
Strong experience in designing and implementing efficient ELT/ETL processes in Azure and Databricks and using open source solutions being able to develop custom integration solutions as needed.
-
Skilled in Data Integration from different sources such as APIs, databases, flat files, event streaming.
-
Expertise in data cleansing, transformation, and validation.
-
Proficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB or Table).
-
Good understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.
-
Strong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in Azure and Databricks.
-
Good experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT).
-
Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies.
-
Strong knowledge of SDLC tools and technologies Azure DevOps and GitHub, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (GitHub actions, Azure Pipelines, Jenkins or similar) and binary repository manager (Azure Artifacts or similar).
-
Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC - Terraform, ARM including hands-on experience), configuration management, automated testing, performance tuning and cost management and optimization.
-
Strong knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics, such as Azure Data Factory, Azure Databricks , Azure Synapse Analytics , Azure Data Lake , Azure Stream Analytics, SQL Server, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etc.
-
Experience in Orchestration using technologies like Databricks workflows and Apache Airflow.
-
Strong knowledge of data structures and algorithms and good software engineering practices.
-
Proven experience migrating from Azure Synapse to Azure Data Lake, or other technologies.
-
Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures.
-
Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.
-
Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent.
-
Experience with BI solutions including PowerBI is a plus.
-
Strong written and verbal communication skills to collaborate and articulate complex situations concisely with cross-functional teams, including business users, data architects, DevOps engineers, data analysts, data scientists, developers, and operations teams.
-
Ability to document processes, procedures, and deployment configurations.
-
Understanding of security practices, including network security groups, Azure Active Directory, encryption, and compliance standards.
-
Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them.
-
Self-motivated with the ability to work well in a team, and experienced in mentoring and coaching different members of the team.
-
A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field.
-
Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements.
-
Care about architecture, observability, testing, and building reliable infrastructure and data pipelines.