ML Systems Engineer
Role details
Job location
Tech stack
Job description
We are seeking an ML Systems Engineer to optimize the performance and efficiency of large language model inference powering our agentic AI platform. This is a technical role focused on low-level systems optimization. You will implement performance optimizations, build evaluation harnesses, and architect multi-node clusters for training and inference that push the limits of LLM throughput and latency. Your work will directly impact the responsiveness and cost-efficiency of AI agents used by leading semiconductor companies to design chips., * Design, deploy, and optimize LLM inference systems across multi-node clusters, maximizing throughput and minimizing latency for production workloads.
- Implement and benchmark concrete inference optimizations.
- Profile and analyze inference bottlenecks at the systems level-from GPU kernel execution to memory bandwidth constraints.
- Build robust evaluation harnesses and benchmarking frameworks that measure accuracy, throughput, latency, and resource utilization across various parallelism strategies.
- Collaborate with research scientists to integrate new model architectures and optimizations into production inference infrastructure.
- Investigate and apply emerging techniques from research papers and open-source projects to continuously improve inference performance.
Requirements
Do you have experience in Research?, * B.S., M.S., or PhD in Computer Science, Electrical Engineering, or related field (or equivalent experience).
- Experience with large-scale ML systems, GPU computing, or high-performance inference optimization.
- Strong proficiency in Python and C++/CUDA; hands-on experience with SGLang, vLLM, PyTorch, or similar inference frameworks.
- Deep understanding of GPU architecture, memory hierarchies, and parallel computing paradigms.
- Experience deploying and optimizing LLMs in production: model serving, batching strategies, distributed inference, or quantization.
- Strong systems-level debugging and profiling skills; comfort working at multiple layers of the stack from CUDA kernels to application logic.
- Familiarity with distributed computing frameworks (Ray, multi-node training/inference) is a plus.
- Self-directed problem solver who is interested in working on ambitious optimization challenges.
Benefits & conditions
Pulled from the full job description
- Food provided
- 401(k)
- Health insurance
- Vision insurance
- Dental insurance
- Unlimited paid time off
- Free parking, * $150K/yr - $350K/yr + Offers Equity. We are open to discuss above-scale compensation with exceptional candidates on a case-by-case basis.
- Unlimited PTO and full benefits (medical, vision, dental, 401k).
- Two engineering-centric offices with free parking, private gym, and free lunch, drinks and snacks.