Senior Full-Stack Engineer, Data platforms (GCP) H/F - IBM Client Innovation Center
Role details
Job location
Tech stack
Job description
As a Data Engineer specializing in Google's data platforms, you will design, build, and maintain data engineering solutions on Google's Cloud ecosystem. You will utilize various Google services to develop batch and real-time data pipelines, perform data migration, and design data layers.
Your primary responsibilities will include:
-
Design Data Pipelines: Design and build data engineering solutions using Google services such as DataProc, DataFlow, PubSub, BigQuery, Big Table, Cloud Spanner, CloudSQL, and AlloyDB for batch and real-time data processing.
-
Develop Data Migration: Develop and manage batch and real-time data pipelines for Data Warehouse and Datalake, ensuring efficient data migration and integration.
-
Manage Data Platform: Schedule and manage the data platform using Google Cloud Scheduler and Cloud Composer (Airflow), ensuring seamless data workflow and pipeline management.
-
Implement Data Solutions: Implement data engineering solutions using Google Cloud Storage, BigTable, BigQuery DataProc with Spark and Hadoop, Google DataFlow with Apache Beam or Python, and other open-source technologies.
-
Optimize Data Pipelines: Optimize and maintain data pipelines for efficiency, scalability, and reliability, ensuring high-quality data output.
Requirements
-
Exposure to Google Cloud Ecosystem: Familiarity with designing, building, and maintaining data engineering solutions on Google's Cloud ecosystem, including services such as Google DataProc, DataFlow, PubSub, BigQuery, Big Table, Cloud Spanner, CloudSQL, and AlloyDB.
-
Experience working with Data Pipelines: Knowledge of developing and managing batch and real-time data pipelines for Data Warehouse and Datalake, including data migration and integration.
-
Exposure to Open-Source Technologies: Familiarity with using Google Cloud Storage, BigTable, BigQuery DataProc with Spark and Hadoop, Google DataFlow with Apache Beam or Python, and other open-source technologies like Apache Airflow, dbt, Spark/Python, or Spark/Scala.
-
Experience working with Data Platform Management: Understanding of scheduling and managing the data platform using Google Cloud Scheduler and Cloud Composer (Airflow).
-
Exposure to Data Engineering Solutions: Familiarity with implementing data engineering solutions using various Google services and open-source technologies.
Preferred technical and professional experience
-
Proficiency in Apache Airflow: Experience working with Apache Airflow for scheduling and managing data pipelines is beneficial. Familiarity with Cloud Composer (Airflow) is also desirable.
-
Knowledge of dbt: Exposure to dbt and its application in data engineering solutions is advantageous.
-
Familiarity with Spark/Scala: Experience working with Spark/Scala is beneficial for developing and managing data pipelines.