Data Engineer
IBA InfoTech Inc.
Durham, United States of America
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Durham, United States of America
Tech stack
Java
Computer-Aided Design
Amazon Web Services (AWS)
Big Data
Cloud Computing
Code Review
Programming Tools
Distributed Systems
Elasticsearch
Fault Tolerance
Hadoop
Hive
Information Lifecycle Management
Python
Data Streaming
Spark
Information Technology
Druid
Apache Flink
Cassandra
Data Analytics
Presto
Stream Processing
Data Pipelines
Programming Languages
Job description
- Develop and deploy highly-available, fault-tolerant software that will help drive improvements towards the features, reliability, performance, and efficiency of the Cloud Analytics platform.
- Actively review code, mentor, and provide peer feedback.
- Collaborate with engineering teams to identify and resolve pain points as well as evangelize best practices.
- Partner with various teams to transform concepts into requirements and requirements into services and tools.
- Engineer efficient, adaptable and scalable architecture for all stages of data lifecycle (ingest, streaming, structured and unstructured storage, search, aggregation) in support of a variety of data applications.
- Build abstractions and re-usable developer tooling to allow other engineers to quickly build streaming/batch self-service pipelines.
- Build, deploy, maintain, and automate large global deployments in AWS.
- Troubleshoot production issues and come up with solutions as required.
Requirements
- You have a strong engineering background with ability to design software systems from the ground up.
- You have expertise in Java, Python or similar programming languages.
- You have experience in web-scale data and large-scale distributed systems, ideally on cloud infrastructure.
- You have a product mindset. You are energized by building things that will be heavily used.
- You have engineered scalable software using big data technologies (e.g. Hadoop, Spark, Hive, Presto, Flink, Samza, Storm, Elasticsearch, Druid, Cassandra, etc).
- You have experience building data pipelines (real-time or batch) on large complex datasets.
- You have worked on and understand messaging/queueing/stream processing systems.
- You design not just with a mind for solving a problem, but also with maintainability, testability, monitorability, and automation as top concerns.