Platform Engineer, Data job in Austin
Role details
Job location
Tech stack
Job description
We are seeking a Data Platform Engineer who combines expert-level data infrastructure skills with a strong knowledge of AI & Machine Learning principles. In this role, you will go beyond simple data validation scripts you will apply your understanding of model training dynamics to design and implement existing and novel approaches to optimize our datasets.
You will build and maintain large-scale image and video pipelines, but with a focus on data curation strategies-such as coreset selection, embedding-based filtering, and automated complexity scoring. You'll partner closely with our ML engineers to orchestrate ingestion, synthetic data generation, and versioned releases, ensuring that every dataset is not only high-integrity and available but strictly optimized to maximize model performance.
What You'll Do:
-
Design and develop a scalable data infastructure, focusing on organization and curation to support continuing increases in data volume and complexity
-
Design and implement existing and novel approaches to optimize datasets for model training (e.g., hard example mining, class balancing, de-duplication, embedded-based filtering).
-
Support the data infrastructure required for optimal ingestion, transformation, and storing of datasets
-
Develop and use synthetic data generation workflows to create realistic synthetic training data for computer vision models.
-
Design and own end-to-end image and video pipelines for computer vision model training: multi-source ingestion, QA and visualization, standardization, and organization.
-
Coordinate collection of real-world data coordinate label creation and QA with labelers.
-
Develop and use data quality tooling: metrics for balance, drift, and annotation error active-learning sampling to target gaps feedback loops from production back to curation.
-
Implement and own dataset versioning, release management, and lineage and metadata cataloging.
Requirements
- 3+ years of experience in data engineering or equivalent fields.
- Solid understanding of data structures and systems design for orchestrating data-related workflows in a rapidly growing environment.
- Proficient in using AWS for data management and processing.
- Proficient in Python for scripting and data processing proficient with SQL and Linux.
- Educational Background: Bachelor's or Master's degree in Computer Science or a related field.
- Proven ability to communicate well across engineering teams, and write and maintain effective documentation.
You'll Stand Out:
- 5+ years of industry experience.
- Experience in image/video data engineering for computer vision projects.
- Experience with PyTorch DeepCore.
- Experience with Unreal Engine.
Benefits & conditions
- Competitive salary
- ACS Equity Package
- Health, Dental, Vision Insurance
- Paid Time Off
Allen Control Systems is an Equal Opportunity Employer, providing equal employment opportunities to all employees and applicants for employment. Allen Control Systems prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.