Machine Learning Engineer
Role details
Job location
Tech stack
Job description
The ChallengeOur generative AI models produce sign language video in real time, delivered over GPU infrastructure across cloud and on-prem environments to a global audience. The engineering problem is genuinely hard: keep latency low, maximise GPU utilisation, and build infrastructure that scales to hundreds of simultaneous streams. You'll work across the full ML inference stack from model optimisation to deployment infrastructure and own this challenge. What you'll work onML inference optimisationProfile and optimise deep learning models used for sign language video generationReduce inference latency using quantisation, pruning, mixed precision, and kernel optimisationImprove GPU utilisation and throughput across inference pipelinesWork closely with ML researchers to ensure models are production-readyML infrastructure & deploymentBuild and maintain scalable model serving systems on GPU clustersDesign autoscaling infrastructure to meet real-time SLAsContribute to model deployment pipelines, versioning, and rollback strategiesPerformance engineeringDevelop benchmarking frameworks for tracking inference performanceIdentify bottlenecks across the ML pipeline and eliminate latency hotspotsImplement performance monitoring and alerting for production systemsEvaluate new hardware accelerators and inference run times. Scaling globallyWork with the research team to expand sign languages and digital signersArchitect systems that allow rapid onboarding of new languagesBuild low-latency infrastructure that scales to hundreds of concurrent streams
Requirements
What we're looking forEssential3+ years of experience in ML systems engineering, ML infrastructure, or backend systems.Strong Python skills (Rust is a bonus)Experience working with production ML modelsStrong debugging, profiling, and performance analysis skillsA genuine interest in building latency-critical, high-throughput systemsDesirableExperience with TensorRT, ONNX, Triton, TorchServe, or similar inference tools.Familiarity with GPU architecture and performance optimisationExperience with video, graphics, or real-time streaming systems (HLS, RTMP,SRT)Experience with Kubernetes, Docker, and ML workloads at scale.Familiarity with AWS, including Sage Maker.