Data Engineer

W. H. GREEN & SONS, INC.
Portland, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Portland, United States of America

Tech stack

Java
Microsoft Windows
Akka
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Azure
Big Data
Unix
Cloud Computing
Cloud Engineering
Cloudera Impala
Software Quality
ETL
Data Security
Data Visualization
Linux
DevOps
Amazon DynamoDB
Elasticsearch
Fault Tolerance
Github
Gradle
Hadoop
Hadoop Distributed File System
MapReduce
Monitoring of Systems
Hive
Spring
Python
Key Management
PostgreSQL
MongoDB
MySQL
Apache Oozie
Oracle Applications
Performance Tuning
Redis
Ansible
Scala
Shell Script
Sqoop
Systems Integration
Tableau
XML
Apache Zookeeper
Datadog
Data Storage Technologies
Apache Yarn
System Availability
Flask
Grafana
Spark
Spring-boot
Apache Pig
Backend
Cloudformation
Apache Flume
Kubernetes
Information Technology
Cassandra
Sentry
Data Analytics
Real Time Data
Kafka
GraphQL
Spark Streaming
Cloudwatch
REST
Terraform
Stream Processing
Splunk
Data Pipelines
Docker
Jenkins
Redshift
Programming Languages
Microservices
Apache Storm

Job description

  • Design and develop real-time data streaming pipelines with Apache Kafka, Flume, and Apache Storm for event-driven applications.
  • Ensure high availability and fault tolerance through scalable systems using YARN, Kubernetes, Docker, and Redis for checkpointing.
  • Develop and optimize Spark applications in Scala, processing large datasets and integrating XML data parsing (DOM/SAX).
  • Model and implement scalable Cassandra solutions, focusing on high-throughput, low-latency real- time data storage and queries.
  • Integrate Cassandra with Spark, Kafka, and Flume for real-time data ingestion and analytics.
  • Design and implement cloud-based deployments on AWS, leveraging services like EC2, S3, and Redshift.
  • Automate cloud infrastructure provisioning using AWS CloudFormation, Terraform, and Ansible for scalable and reproducible environments.
  • Implement monitoring and alerting systems using Splunk, Grafana, Elastic Search on AWS for system health tracking.
  • Develop and maintain backend applications using Java, Scala, Spring Boot, and Akka for scalable microservices.
  • Optimize backend services for performance, reliability, and scalability in a microservices architecture.
  • Ensure data security and compliance by implementing encryption, access control, and other best practices across data storage, pipelines, and cloud infrastructure.
  • Design and optimize Cassandra data models for time-series data, ensuring efficient queries and minimal latency.
  • Write high-performance data pipelines for real-time data processing, using technologies such as Kafka and Spark Streaming.
  • Work on system performance optimization, including fine-tuning Spark jobs, Cassandra clusters, and Kafka-based pipelines for optimal throughput.
  • Develop and expose RESTful APIs and GraphQL endpoints for backend services to interact with external systems or clients.
  • Collaborate with stakeholders to translate business requirements into technical solutions, ensuring high-quality project delivery.
  • Contribute to product roadmaps, prioritize features, and align development efforts with business goals.
  • Lead and mentor junior engineers, providing guidance on best practices for system design, code quality, and performance tuning.
  • Manage client expectations and ensure smooth project execution through regular status updates, reviews, and collaboration.
  • Drive project delivery by managing timelines, effort estimation, work breakdown, and ensuring alignment with business objectives.

Technologies/Environment involved:

  • Big Data / Hadoop: HDFS, MapReduce, Apache Pig, Hive, Sqoop, Flume, Yarn (MR2), Impala, Zookeeper, HUE (Hadoop User Experience), Sentry, Oozie, Spark, Key Management Server, Shell Scripting, Cloud Computing Architecture.
  • Data Visualization and BI Tools: ETL Process Tools , Dashboard , Data Analytics, Tableau.
  • Operating Systems: Linux, Unix, Windows.
  • Databases: MongoDB, DynamoDB, Postgres, MySQL, Oracle.
  • Cloud Technologies: WCNP, AWS, Azure.
  • DevOps Tools: GitHub, Gradle, Jenkins, Docker, Kubernetes.
  • Programming Languages & Framework: Java, Scala, Python, Spring, Akka, Flask
  • Log monitoring tools: Datadog, Splunk, Grafana, Cloud watch

Requirements

Do you have experience in ZooKeeper?, Do you have a Master's degree?, Data Engineer with Bachelor's degree in Computer Science, Computer Information Systems, Information Technology, or a combination of education and experience equating to the U.S. equivalent of a Master's degree in one of the aforementioned subjects.

Apply for this position