GCP Data Engineer
Data Inc
San Jose, United States of America
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
San Jose, United States of America
Tech stack
Java
Airflow
Google BigQuery
Cloud Database
Computer Programming
Continuous Integration
Information Engineering
ETL
Data Systems
Data Warehousing
DevOps
Hive
Python
Machine Learning
Performance Tuning
Cloud Services
Standard Sql
Software Deployment
SQL Databases
Data Streaming
Workflow Management Systems
Google Cloud Platform
Data Ingestion
Snowflake
Spark
Data Lake
PySpark
Information Technology
Apache Flink
Kafka
Spark Streaming
Data Management
Video Streaming
Stream Processing
Data Pipelines
Docker
Databricks
Job description
We are seeking a highly skilled Senior Data Engineer with strong expertise in Apache Spark, Streaming Technologies, and Google Cloud Platform (GCP) to design, build, and optimize scalable data pipelines supporting analytics, reporting, and machine learning workloads. The ideal candidate will have extensive experience developing both batch and real-time data processing solutions using Spark, Kafka, and cloud-native data services., * Design and develop scalable batch and real-time data pipelines using Spark and Kafka/PubSub
- Build and optimize BigQuery-based data platforms and lakehouse architectures
- Develop ETL/ELT frameworks for data ingestion, transformation, and delivery
- Optimize Spark jobs, SQL queries, and data workflows for performance and cost efficiency
- Implement data quality, monitoring, validation, and alerting mechanisms
- Collaborate with Data Scientists, Analysts, and Business Teams to deliver reliable data solutions
- Support production deployments and troubleshoot complex data engineering issues
Requirements
- 10+ years of overall IT experience
- 5+ years of recent hands-on GCP experience
- Strong experience building enterprise-scale data platforms and streaming architectures
Required Skills
- Strong programming skills in Python and SQL
- Hands-on expertise with Apache Spark (PySpark, Spark SQL, DataFrames, Spark Streaming)
- Experience with Kafka, Pub/Sub, Flink, or other streaming technologies
- Strong knowledge of BigQuery, GCS, Data Lakes, and Data Warehousing concepts
- Experience designing and developing ETL/ELT pipelines
- Data modeling and performance optimization experience
- Experience with Airflow for workflow orchestration
- Knowledge of Snowflake, Redshift, or other cloud data warehouses
- Experience implementing data quality, monitoring, and alerting solutions
Preferred Skills
- Scala or Java development experience
- Databricks experience
- Docker and Kubernetes
- CI/CD and DevOps practices
- Experience supporting ML/Data Science workloads