Principal Firmware Engineer, Annapurna Labs ML Acceleration Systems Software
Role details
Job location
Tech stack
Job description
In this role, you will lead a team of software and firmware developers to design and develop server software at AWS scale. You'll collaborate with hardware developers and software engineers to design validation strategies that ensure reliability across our entire product line. Your days will include mentoring your team through complex technical challenges, establishing operational procedures that scale across products, and working cross-functionally to integrate design-for-excellence principles into our development process. You'll also participate in technical discussions that shape how we approach system design & validation, ensuring we're catching issues before they reach customers.
This is a fast-paced, intellectually challenging position, and you'll work with thought leaders in multiple technology areas. You'll have high standards for yourself and everyone you work with, and you'll be constantly looking for ways to improve your product's performance, quality and cost. Using data and key metrics, you will also drive and measure process improvements that enhance our operational effectiveness.
A day in the life Your day to day responsibilities will include interfacing with our internal and external customers to understand project requirements and facilitate system development ontop of your server design. You will be responsible for learning operational challenges to our existing fleet with the goal of improving the current customer experience as well as developing improved systems for future designs. You will work directly with vendors and ODM/JDM design teams to develop and manufacture your product at scale.
About the team
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, design reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.
We're a collaborative group of software engineers and hardware developers united by a shared mission: making Amazon Trainium products more reliable and easier to troubleshoot. Our team values partnership across disciplines-your success depends on building strong relationships with hardware specialists, validation engineers, and other technical leaders. We're focused on establishing best-in-class operational procedures and diagnostic capabilities that set the standard for the industry. By joining us, you'll help shape the future of how we approach system reliability and contribute to products that power some of the most demanding machine learning applications in the world.
Requirements
7+ years of working directly with engineering teams experience
- Experience managing programs across cross functional teams, building processes and coordinating release schedules
- Experience building and evaluating system-level technical design
- Bachelor's degree in Computer Science, Computer Engineering, or related fields
- Experience managing teams, or experience as a mentor, tech lead or leading an engineering team
- Experience in software development, or experience troubleshooting and debugging technical systems and experience that includes strong analytical skills, attention to detail, and effective communication abilities
- Experience with hardware/software integration and real-time systems
- 10+ years of systems software or firmware engineering
- Proficiency with programming languages commonly used in systems software (such as C, C++, Rust, or Python)
Preferred Qualifications
- 5+ years of project management disciplines including scope, schedule, budget, quality, along with risk and critical path management experience
- Experience managing projects across cross functional teams, building sustainable processes and coordinating release schedules
- Experience defining KPI's/SLA's used to drive multi-million dollar businesses and reporting to senior leadership
- Master's degree in Computer Science, Computer Engineering, or related fields
- Experience troubleshooting and debugging technical systems
- 5+ years of embedded firmware development experience
- Knowledge of data center infrastructure design, operations, or delivery
- Experience navigating a knowledge base and following Standard Operating Procedures (SOPs)
- Experience with AI or machine learning applications in systems engineering
Benefits & conditions
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, TX, Austin - 144,100.00 - 194,900.00 USD annually