Data Engineer - INTL India

Insight Global
Bentonville, United States of America
11 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Bentonville, United States of America

Tech stack

Airflow
Data analysis
Azure
Big Data
Google BigQuery
Cloud Computing
Profiling
Computer Programming
Data Validation
ETL
Fault Tolerance
Hadoop
Hive
Python
Performance Tuning
Query Optimization
Cloud Services
Cloudera
Software Construction
SQL Databases
Data Streaming
Workflow Management Systems
Data Processing
Google Cloud Platform
Cloud Platform System
Data Ingestion
Sql Optimization
System Availability
Spark
Kafka
Software Version Control
Data Pipelines

Job description

We are seeking a skilled Data Engineer to design, build, and optimize large-scale batch data pipelines in a cloud environment. This role focuses on reliability, performance, and data quality, supporting analytics and downstream consumers through well-engineered big-data solutions. The ideal candidate has strong experience with Apache Spark, cloud data platforms (GCP preferred), and writing performant SQL at scale., Design, develop, and maintain batch data pipelines using Apache Spark, Hadoop, Hive, or similar frameworks in a cloud environment.

Build highly optimized, fault-tolerant, and SLA-driven data pipelines that operate reliably at scale.

Leverage Google Cloud Platform (GCP) services such as BigQuery, GCS, Dataproc, and Pub/Sub to support data ingestion, processing, and storage.

Write and optimize SQL queries (BigQuery SQL and/or Spark SQL) for data analysis, profiling, and performance tuning.

Collaborate closely with analytics, data science, and downstream consumers to ensure data availability, correctness, and usability.

Monitor and troubleshoot pipeline failures; implement alerting, retries, and data quality checks.

Improve pipeline performance through partitioning, clustering, resource tuning, and query optimization.

Follow software engineering best practices, including version control, testing, and documentation.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Requirements

Strong experience with big data technologies such as Apache Spark, Hadoop, and Hive.

Hands-on experience building batch data pipelines with a focus on performance, scalability, SLA adherence, and fault tolerance.

Strong programming skills in Python and/or Scala, with deep experience using Spark for data processing and analytics.

Experience working with GCP services including BigQuery, Google Cloud Storage (GCS), Dataproc, and Pub/Sub.

Solid experience writing and optimizing SQL, preferably BigQuery SQL and/or Spark SQL.

Strong understanding of data modeling, ETL/ELT patterns, and data quality best practices.

Experience with Kafka or similar messaging/streaming platforms.

Familiarity with workflow orchestration tools (e.g., Airflow or Cloud Composer).

Experience deploying and operating data pipelines in production cloud environments (GCP preferred, Azure acceptable).

Strong troubleshooting skills and ability to optimize pipelines under real-world constraints.

Apply for this position