Data Engineer

Insight Global
Santa Clara, United States of America
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 220K

Job location

Santa Clara, United States of America

Tech stack

Airflow
Amazon Web Services (AWS)
Azure
Big Data
Google BigQuery
Cloud Computing
Cloud Storage
Computer Programming
Information Engineering
Data Governance
Data Infrastructure
Python
Operational Databases
Role-Based Access Control
Google Cloud Platform
Spark
Containerization
Kubernetes
Dask
Machine Learning Operations
Terraform
Software Version Control
Data Pipelines
Docker

Job description

We're seeking a highly skilled Data Engineer to design, build, and maintain production-grade data pipelines that process and transform terabytes of data. In this role, you'll collaborate closely with data scientists and other SWEs to ensure that our data infrastructure is scalable, reliable, and cost-effective.

Requirements

3-5 years of professional experience designing and operating production data pipelines at scale.

Containerization & Orchestration: Expertise with Docker, Kubernetes, and Helm.

-Workflow Management: Hands-on experience building DAG-based pipelines in Apache Airflow.

-Programming: Strong proficiency in Python for data engineering tasks.

-Distributed Frameworks: Practical experience with Dask or Apache Spark for large-scale data processing.

-Cloud Fundamentals: Familiarity with deploying and managing services in a cloud environment.

-GCP Proficiency: Hands-on with Google Cloud services (e.g., Pub/Sub, Big Query, Cloud Storage, GKE). Equivalent experience in other public cloud providers is fine.

-ML Pipelines: Exposure to deploying cross-cluster model-training workflows using Ray or similar frameworks.

Infrastructure as Code: Familiarity with Terraform for deployment.

-Security & Compliance: Knowledge of data governance, encryption, and role-based access control. * Experience with Go programming language.

  • Familiarity with acceleration frameworks such as RAPIDS or Spark.

  • Knowledge of cloud platforms (AWS, GCP, Azure).

Experience with data version control and MLOps practices.

Apply for this position