Data Engineer

W. H. GREEN & SONS, INC.

Portland, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Portland, United States of America

Tech stack

Java

Microsoft Windows

Akka

Amazon Web Services (AWS)

Azure

Big Data

Unix

Cloud Computing

Cloud Engineering

Cloudera Impala

Software Quality

ETL

Data Security

Data Visualization

Linux

DevOps

Amazon DynamoDB

Elasticsearch

Fault Tolerance

Github

Gradle

Hadoop

Hadoop Distributed File System

MapReduce

Monitoring of Systems

Hive

Spring

Python

Key Management

PostgreSQL

MongoDB

MySQL

Apache Oozie

Oracle Applications

Performance Tuning

Redis

Ansible

Scala

Shell Script

Sqoop

Systems Integration

Tableau

XML

Apache Zookeeper

Datadog

Data Storage Technologies

Apache Yarn

System Availability

Flask

Grafana

Spark

Spring-boot

Apache Pig

Backend

Cloudformation

Apache Flume

Kubernetes

Information Technology

Cassandra

Sentry

Data Analytics

Real Time Data

Kafka

GraphQL

Spark Streaming

Cloudwatch

REST

Terraform

Stream Processing

Splunk

Data Pipelines

Docker

Jenkins

Redshift

Programming Languages

Microservices

Apache Storm

Job description

Design and develop real-time data streaming pipelines with Apache Kafka, Flume, and Apache Storm for event-driven applications.
Ensure high availability and fault tolerance through scalable systems using YARN, Kubernetes, Docker, and Redis for checkpointing.
Develop and optimize Spark applications in Scala, processing large datasets and integrating XML data parsing (DOM/SAX).
Model and implement scalable Cassandra solutions, focusing on high-throughput, low-latency real- time data storage and queries.
Integrate Cassandra with Spark, Kafka, and Flume for real-time data ingestion and analytics.
Design and implement cloud-based deployments on AWS, leveraging services like EC2, S3, and Redshift.
Automate cloud infrastructure provisioning using AWS CloudFormation, Terraform, and Ansible for scalable and reproducible environments.
Implement monitoring and alerting systems using Splunk, Grafana, Elastic Search on AWS for system health tracking.
Develop and maintain backend applications using Java, Scala, Spring Boot, and Akka for scalable microservices.
Optimize backend services for performance, reliability, and scalability in a microservices architecture.
Ensure data security and compliance by implementing encryption, access control, and other best practices across data storage, pipelines, and cloud infrastructure.
Design and optimize Cassandra data models for time-series data, ensuring efficient queries and minimal latency.
Write high-performance data pipelines for real-time data processing, using technologies such as Kafka and Spark Streaming.
Work on system performance optimization, including fine-tuning Spark jobs, Cassandra clusters, and Kafka-based pipelines for optimal throughput.
Develop and expose RESTful APIs and GraphQL endpoints for backend services to interact with external systems or clients.
Collaborate with stakeholders to translate business requirements into technical solutions, ensuring high-quality project delivery.
Contribute to product roadmaps, prioritize features, and align development efforts with business goals.
Lead and mentor junior engineers, providing guidance on best practices for system design, code quality, and performance tuning.
Manage client expectations and ensure smooth project execution through regular status updates, reviews, and collaboration.
Drive project delivery by managing timelines, effort estimation, work breakdown, and ensuring alignment with business objectives.

Technologies/Environment involved:

Big Data / Hadoop: HDFS, MapReduce, Apache Pig, Hive, Sqoop, Flume, Yarn (MR2), Impala, Zookeeper, HUE (Hadoop User Experience), Sentry, Oozie, Spark, Key Management Server, Shell Scripting, Cloud Computing Architecture.
Data Visualization and BI Tools: ETL Process Tools , Dashboard , Data Analytics, Tableau.
Operating Systems: Linux, Unix, Windows.
Databases: MongoDB, DynamoDB, Postgres, MySQL, Oracle.
Cloud Technologies: WCNP, AWS, Azure.
DevOps Tools: GitHub, Gradle, Jenkins, Docker, Kubernetes.
Programming Languages & Framework: Java, Scala, Python, Spring, Akka, Flask
Log monitoring tools: Datadog, Splunk, Grafana, Cloud watch

Requirements

Do you have experience in ZooKeeper?, Do you have a Master's degree?, Data Engineer with Bachelor's degree in Computer Science, Computer Information Systems, Information Technology, or a combination of education and experience equating to the U.S. equivalent of a Master's degree in one of the aforementioned subjects.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all