Lead Software Engineer-AI Platform Engineer
Role details
Job location
Tech stack
Job description
-
Execute creative software solutions, including design, development, and technical troubleshooting, with the ability to think beyond conventional approaches to build solutions or resolve technical problems.
-
Develop secure, high-quality production code, and review and debug code written by others.
-
Identify opportunities to eliminate or automate the remediation of recurring issues to enhance the overall operational stability of software applications and systems.
-
Lead evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented assessments of architectural designs, technical credentials, and their applicability within existing systems and information architecture.
-
Lead communities of practice across Software Engineering to promote awareness and adoption of new and leading-edge technologies.
-
Contribute to a team culture of diversity, equity, inclusion, and respect.
-
Develop and deploy cloud infrastructure platforms that are secure, scalable, and optimized for AI and machine learning workloads.
-
Collaborate with AI teams to understand computational needs and translate these into infrastructure requirements.
-
Monitor, manage, and optimize cloud resources to maximize performance and minimize costs.
-
Design and implement continuous integration and delivery pipelines for machine learning workloads.
Requirements
-
Formal training or certification in software engineering concepts with 5+ years of applied experience.
-
Hands-on practical experience in delivering system design, application development, testing, and ensuring operational stability.
-
Proficiency in at least one programming language, such as Python, Go, Java, or C#.
-
Proficiency in automation and continuous delivery methods.
-
Proficient in all aspects of the Software Development Life Cycle.
-
Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.).
-
Foundational understanding of machine learning concepts, including transformer architecture, ML training, and inference.
-
Experience in solutions design and engineering, containerization (Docker, Kubernetes), and cloud service providers (AWS, Azure, GCP).
-
Experience with Infrastructure as Code.
-
Deep understanding of cloud component architecture: Microservices, Containers, IaaS, Storage, Security, and routing/switching technologies.
Preferred qualifications, capabilities, and skills
-
Foundational understanding of NVIDIA GPU infrastructure software (e.g., DCGM, BCM, Triton Inference).
-
Hands-on experience with machine learning frameworks such as PyTorch and TensorBoard.
-
Proficiency with observability tools like Prometheus and Grafana.
-
Experience in ML Ops and related tooling, including MLflow.
-
Background in high performance computing and ML frameworks (e.g., vLLM, Ray.io, Slurm).
-
Strong knowledge of network architecture, database programming (SQL/NoSQL), and data modeling.
-
Familiarity with cloud data services, big data processing tools, and Linux environments (scripting and administration).
Benefits & conditions
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process., Jersey City,NJ $152,000.00 - $215,000.00 / year; Palo Alto,CA $152,000.00 - $215,000.00 / year