Data Infrastructure Engineer

Alljoined Inc.

San Francisco, United States of America

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 180K

Job location

San Francisco, United States of America

Tech stack

Amazon Web Services (AWS)

Azure

C++

Databases

Data Infrastructure

ETL

Data Loss

FFmpeg

InfiniBand

Python

OpenCV

TensorFlow

Software Deployment

Data Streaming

Unstructured Data

Video Editing

Rust

Graphics Processing Unit (GPU)

PyTorch

Deep Learning

Backend

Bare Metal

Vertica

Data Pipelines

Job description

Alljoined is creating a future where humans are fully understood and augmented by technology. Our work solves the communication bottleneck between humans and computers by decoding thoughts from the brain, entirely non-invasively. We apply deep learning research to large scale EEG datasets to decode multimedia input, eventually moving to internal thought. We are state-of-the art in capabilities and are fully vertically integrated. Our goal is to develop a general consumer interface to completely transform how we can live our lives.

We are actively growing our founding engineering team to build the underlying infrastructure that makes this ambitious future a reality., As a Data Infrastructure Engineer, you will build the backend and hardware architecture that allows us to do high-quality and fast research. You'll be owning our entire data lifecycle, from building pipelines that process massive multimodal datasets (video, audio, text, time-series) to provisioning and managing both cloud and bare metal compute clusters we use to train on it. You will be powering our foundational model training by bridging the gap between physical neuro hardware and our central repositories, working alongside world-class researchers to ensure they have a high-throughput, low-latency pipeline straight to the GPUs.

Requirements

Have 3+ years of production software engineering experience with deep expertise in systems-level architecture and languages like Python, Rust, C++, or Go.
Have built and maintained high-performance ETL pipelines capable of processing, buffering, and storing terabytes of daily unstructured data.
Are comfortable architecting, provisioning, and maintaining bare-metal local compute clusters, storage servers, and high-speed networking for intensive ML workloads.
Have a background in handling continuous, highly concurrent data streams from heterogeneous hardware peripherals without data loss.
Are capable of working across hybrid environments to define storage topologies, manage databases (TimescaleDB, ClickHouse), and sync massive datasets between on-premise edge servers and the cloud (AWS/GCP/Azure).
Enjoy owning the entire technical lifecycle of infrastructure, from optimizing low-level I/O bound operations to production deployment.

Strong candidates may have

A deep understanding of modern ML frameworks (PyTorch/TensorFlow) and know how to build datasets that maximize and saturate GPU utilization.
Experience managing networking for distributed GPU training (InfiniBand, RoCE) or optimizing zero-copy networking and shared memory.
Built infrastructure involving programmatic video processing (FFmpeg, GStreamer, OpenCV)

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all