AI Runtime Engineer

Oho Group Ltd
Daly City, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Daly City, United States of America

Tech stack

Artificial Intelligence
Nvidia CUDA
Computer Engineering
Concurrent Computing
Software Debugging
Distributed Systems
General-Purpose Computing on Graphics Processing Units
Machine Learning
TensorFlow
Software Engineering
System Programming
AI Infrastructure
Datadog
Multithreading
High Performance Computing
PyTorch
Parallel Computation
Information Technology
Low Latency
Hardware Acceleration
C++14

Job description

We're looking for a Runtime Engineer to help build and optimise the execution layer that powers next-generation AI workloads. Working at the intersection of systems software, compiler technology, and hardware acceleration, you'll play a key role in ensuring that compiled models execute with maximum performance, scalability, and reliability across a range of computing architectures.

This is an exciting opportunity to work on low-level runtime systems, execution engines, and hardware-aware optimisation, collaborating closely with compiler, hardware, and product teams to shape the future of AI infrastructure., * You will help design, build, and evolve a high-performance execution engine capable of supporting multiple hardware platforms and accelerator architectures.

  • You'll get the chance to optimise workload execution through advanced scheduling, partitioning, and parallelisation strategies that maximise hardware utilisation.
  • You will work directly with compiled workloads and binaries, profiling execution behaviour and identifying opportunities for performance improvements.
  • This is an excellent opportunity for you to develop internal tooling, telemetry systems, and diagnostic frameworks that help uncover execution bottlenecks and system inefficiencies.
  • You'll be responsible for analysing runtime performance across physical hardware, ensuring models achieve optimal throughput, latency, and resource utilisation.
  • You will contribute to the development and evaluation of experimental runtime features, prototypes, and execution strategies that influence future platform capabilities.
  • You'll collaborate closely with compiler, hardware, and product teams to translate machine learning requirements into scalable runtime solutions.

Requirements

  • You'll need strong experience developing runtime systems, execution engines, systems software, or hardware-facing infrastructure.
  • You should be highly proficient in modern C++ and comfortable working within large-scale performance-critical codebases.
  • You must have a strong understanding of concurrent programming, multi-threaded architectures, asynchronous execution, and workload scheduling.
  • You'll need a solid understanding of computer architecture, including memory hierarchies, cache behaviour, processor execution models, and low-level performance considerations.
  • Experience working close to operating system primitives, drivers, kernel-level functionality, or low-level systems programming is highly desirable.
  • You should be comfortable profiling, debugging, and optimising software running directly on physical hardware platforms.

Preferred Qualifications

  • Experience working with GPU computing technologies such as CUDA, ROCm, or other accelerator programming frameworks.
  • Exposure to machine learning frameworks and compiler technologies including Triton, PyTorch, JAX, MLIR, or similar ecosystems.
  • Understanding of distributed computing systems, HPC environments, or large-scale parallel processing architectures.
  • Experience building performance analysis, telemetry, or observability tooling for complex software systems.
  • Strong interest in compiler technology, hardware acceleration, and AI infrastructure., You should be educated to BS, MS, or PhD level in Computer Science, Computer Engineering, Electrical Engineering, or a related technical discipline, or possess equivalent industry experience.

Apply for this position