Software Engineer, Data Infrastructure & Acquisition - Phoenix, AZ, USA

REMOTE HAND
Phoenix, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 200K

Job location

Remote
Phoenix, United States of America

Tech stack

Artificial Intelligence
Bash
Big Data
Cloud Computing
Data Infrastructure
Linux System Administration
Machine Learning
Software Engineering
Web Crawlers
Scripting (Bash/Python/Go/Ruby)
Information Technology
Terraform
Data Pipelines
Docker

Job description

The Software Engineer, Data Infrastructure & Acquisition is responsible for managing and enhancing data collection processes that fuel AI model training. This role contributes by sourcing and ingesting large volumes of audio data, optimizing cloud infrastructure, and collaborating to improve data quality and cost efficiency. The position plays a key part in shaping the dataset roadmap to advance next-generation AI products.

  1. Responsibilities:
  • Identify and acquire new audio data sources for ingestion.

  • Operate and develop cloud infrastructure for data ingestion pipelines on GCP using Terraform.

  • Collaborate with scientists to optimize cost, throughput, and data quality.

  • Work with the AI team and leadership to define the dataset strategy.

Requirements

  • BS/MS/PhD in Computer Science or related field.

  • Over 5 years of software development experience.

  • Proficient in bash/Python scripting within Linux environments.

  • Experience with Docker, Infrastructure-as-Code, and major cloud platforms (GCP preferred).

  • Knowledge of web crawlers and large-scale data processing is a plus.

  • Ability to manage multiple priorities and adapt as needed.

  • Strong verbal and written communication skills.

Benefits & conditions

  • The United States base salary range for this full-time position is $140,000-$200,000 plus bonus and equity, depending on experience.

About the company

The organization operates in the artificial intelligence and audio technology sector, focusing on developing advanced models supported by large-scale, high-quality datasets. It addresses challenges related to data collection and management at petabyte scale by integrating infrastructure, engineering, and research to efficiently support model training for consumer and enterprise applications.

Apply for this position