Senior Data Engineer : Data Lake (Remote)

Constructor
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
€ 120K

Job location

Remote

Tech stack

API
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Automation of Tests
Databases
Python
Prometheus
Data Streaming
Data Processing
Data Storage Technologies
Data Ingestion
Large Language Models
Spark
AWS Lambda
Cloudformation
FastAPI
Data Lake
Core Data
Sentry
Terraform
Data Pipelines
Pagerduty
Databricks

Job description

The Constructor Data Platform is a foundational component for all internal data and ML teams. It handles the ingestion of over 2 TB of compressed events daily and manages over 6 PB of data in our data lake.

The Data Platform:

  • Is a comprehensive set of tools and infrastructure used daily by every data scientist and ML engineer in our company.
  • Implements public-facing APIs for event ingestion (FastAPI) and real-time analytics (ClickHouse, Cube).
  • Manages data storage in appropriate formats (S3, ClickHouse, Delta).
  • Facilitates data processing using technologies such as Python, Spark/Databricks, ClickHouse, AWS Lambda, and Kinesis.
  • Includes robust monitoring solutions (Prometheus, OpenTelemetry, PagerDuty, Sentry).
  • Ensures automated testing of pipelines and data quality.
  • Provides cost observability and optimization capabilities.
  • Offers comprehensive tools for developers to develop, run, test, and schedule data pipelines, along with all necessary support and documentation., We're hiring a Senior Data Engineer to work on our Data Lake Team. Here is what we doing day to day:
  • Maintain data pipeline job framework
  • Develop Data Quality framework ( internal set of tools for internal and external data sources validation )
  • Maintain and develop public facing data ingestion service with 17 000+ RPS.
  • Maintain and develop core data pipelines in batch and streaming manners.
  • Be a last line of support for our internal platform users.
  • Take a part in an on-call rotation for data platform incidents (shared across the team)., Your primary focus will be on building and operating various data platform components (data quality, data pipelines, infrastructure, monitoring), with opportunities to contribute to API services and LLM-powered analytics tools. You'll work closely with data scientists, ML engineers, and analytics teams to understand their needs, gather feedback, and improve platform reliability and usability. Here are some of the projects you may be involved with:
  • Adopt configuration of Data Platform through IaC using terraform.
  • Take part in the development of the Data Quality framework and drive its adoption in the company.
  • Improve BI self-service through LLM powered tools.
  • Migrate batch workloads to streaming solutions to ensure data is delivered in a timely manner.

Requirements

Do you have experience in Usability?, Do you have a Master's degree?, * Fluent English

  • 4+ years building production services and data pipelines (batch and/or streaming)
  • Strong experience with Python or the readiness to ramp up quickly.
  • Hands-on experience with at least one MPP system (Spark, Trino, Redshift etc.)
  • Hands-on experience operating services in a cloud environment (AWS preferred)

Nice to have

  • Terraform/CloudFormation or other IaC tools
  • ClickHouse or similar analytical databases
  • Experiences with data quality/observability tools

Benefits & conditions

  • ️ Unlimited vacation time - we strongly encourage all employees to take at least 3 weeks per year
  • Fully remote team - choose where you live
  • ️ Work from home stipend - we want you to have the resources you need to set up your home office
  • Apple laptops provided for new employees
  • Training and development budget - refreshed each year for every employee
  • Maternity & Paternity leave for qualified employees
  • Work with smart people who will help you grow and make a meaningful impact
  • Base salary: $80k-$120k USD, depending on knowledge, skills, experience, and interview results
  • Stock options - offered in addition to the base salary
  • Regular team offsites to connect and collaborate

About the company

Constructor is the next-generation platform for search and discovery in ecommerce, built to explicitly optimize for metrics like revenue, conversion rate, and profit. Our search engine is entirely invented in-house utilizing transformers and generative LLMs, and we use its core and personalization capabilities to power everything from search itself to recommendations to shopping agents. Engineering is by far our largest department, and we've built our proprietary engine to be the best on the market, having never lost an AB test to a competitive technology. We're passionate about maintaining this and work on the bleeding edge of AI to do so. Out of necessity, our engine is built for extreme scale and powers over 1 billion queries every day across X languages and with customers based out of Y countries. It is used by some of the biggest ecommerce companies in the world like Sephora, Under Armour, and Petco. We're a passionate team who love solving problems and want to make our customers' and coworkers' lives better. We value empathy, openness, curiosity, continuous improvement, and are excited by metrics that matter. We believe that empowering everyone in a company to do what they do best can lead to great things. Constructor is a U.S. based company that has been in the market since 2019. It was founded by Eli Finkelshteyn and Dan McCormick who still lead the company today.

Apply for this position