Software Engineer, ML Infrastructure, Level 4
Role details
Job location
Tech stack
Job description
- Design and optimize infrastructure systems for machine learning workloads at scale and drive reliability and efficiency improvements across Snapchat's ML Infrastructure
- Build and enhance feature generation and serving pipelines that power online inferencing and offline training data generation
- Develop high-performance inference systems to ensure fast and efficient AI model serving
- Build infrastructure to perform scalable ML model training, evaluation, and inference in the cloud
- Develop high-performance inference systems to ensure fast and efficient AI model serving
- Build comprehensive data management systems for scalable data collection, labeling, processing, and evaluation
- Work closely with ML engineers to deploy cutting-edge models into production
- Utilize AI tools and high velocity engineering workflows to design and ship scalable services while upholding rigorous standards for code correctness, security, and production ready quality code
Requirements
- Strong programming skills in Python, Java, Scala or C++ Strong problem-solving skills with a focus on system performance, scalability, and efficiency
- Good understanding of distributed systems and the infrastructure components of large-scale ML
- Experience with big data processing frameworks such as Spark, Flink, or Ray
- Ability to collaborate and work well with others
- Proven track record of operating highly-available systems at significant scale
- Ability to proactively learn new concepts and apply them at work
- Adaptability in learning and applying evolving AI systems and tools to remain at the forefront of engineering trends and modern development practices, * Bachelor's degree in a technical field such as computer science or equivalent experience
- 2+ years of post-Bachelor's software development experience; or Master's degree in a technical field + 1+ year of post-grad software development experience; or PhD in a relevant technical field
- Experience building large scale production machine learning systems, distributed systems or big data processing, * Masters/PhD in a technical field such as computer science or equivalent industry experience
- Experience working with ML Training platforms or optimizing AI model inference
- Familiarity with ML frameworks such as TensorFlow, PyTorch, Caffe2, Spark ML, scikit-learn, or related frameworks
Benefits & conditions
In the United States, work locations are assigned a pay zone which determines the salary range for the position. The successful candidate's starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. The starting pay may be negotiable within the salary range for the position. These pay zones may be modified in the future.
Zone A (CA, WA, NYC) (https://careers.snap.com/us-payzones) :
The base salary range for this position is $157,000-$235,000 annually.
Zone B (https://careers.snap.com/us-payzones) :
The base salary range for this position is $149,000-$223,000 annually.
Zone C (https://careers.snap.com/us-payzones) :
The base salary range for this position is $133,000-$200,000 annually.
This position is eligible for equity in the form of RSUs.