Senior Data Engineer with GCP

Mphasis

New York, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

New York, United States of America

Tech stack

Java

API

Airflow

Apache HTTP Server

Batch Processing

Google BigQuery

Cloud Storage

Code Review

Computer Programming

Continuous Integration

Data as a Services

Data Transmissions

Information Engineering

Data Governance

ETL

Data Virtualization

DevOps

Distributed Systems

Data Flow Control

Python

Network Connections

Performance Tuning

DataOps

SQL Databases

Data Streaming

Google Cloud Platform

Apigee

Real Time Data

Kafka

Data Management

Terraform

Looker Analytics

Apache Beam

Job description

Architect and own scalable, secure, cloud-native data platforms on Google Cloud Platform Design, build, and optimize batch and real-time data pipelines using BigQuery, Dataflow, Pub/Sub, and Dataproc Lead BigQuery performance tuning and cost optimization (partitioning, clustering, query efficiency) Orchestrate workflows using Cloud Composer (Apache Airflow) Enable Al/ML and GenAl integration via Vertex Al and BigQuery ML Enforce data governance, security, reliability, and FinOps best practices Mentor engineers, conduct design/code reviews, and set enterprise data engineering standards

Collaborate with product, analytics, and data science teams to deliver business-critical insights

Requirements

GCP Data Services: BigQuery, Dataflow (Apache Beam), Pub/Sub, Cloud Storage, Cloud Composer, Dataproc
Programming & SQL: Advanced SQL, Python (Java/Scala a plus)
Data Engineering: ETL/ELT, streaming & batch processing, data modeling, distributed systems
Modern Architectures: Lakehouse, Apache Iceberg, Data Mesh concepts
Al/ML Enablement: Vertex Al, BigQuery ML, GenAl-ready pipelines DevOps & laC: Terraform, CI/CD, DataOps practices Leadership: Architecture ownership, mentoring, stakeholder communication, problem solving
Certification: Google Cloud Professional Data Engineer (strongly preferred / often mandatory) In addition to big query, storage bucket, following are necessary skills - data flow, composer, cloud scheduler, Pubsub and Kafka, Apigee gateway and API, Dataplex, basic knowledge of network connectivity (knowledge on data catalog, DLP, BQDTS, STS and other data transfer methodologies). Reporting background (powerbi) and ICEBERG are MUST. Data virtualization (Trenio or equivalent), Looker and GCP vertex will be a plus.