Senior Machine Learning Engineer (GPU Optimization)
Fintal, LLC
New York, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
New York, United States of America
Tech stack
C++
Nvidia CUDA
Software Debugging
Distributed Computing Environment
Memory Management
Field-Programmable Gate Array (FPGA)
Machine Learning
Performance Tuning
TensorFlow
Scientific Computating
Software Engineering
System Programming
AI Infrastructure
High Performance Computing
Parallel Computation
Gpu Programming
Low Latency
Hardware Acceleration
Machine Learning Operations
C++14
Job description
- Design, develop, and optimize ML models and inference pipelines for latency-sensitive trading applications.
- Build high-performance GPU-accelerated systems using CUDA and modern C++.
- Profile and optimize compute, memory, and networking bottlenecks across large-scale distributed environments.
- Collaborate closely with researchers, quant traders, and infrastructure engineers to deploy production-grade ML solutions.
- Drive performance improvements across GPU architectures, kernels, and training/inference workflows.
Requirements
- Strong commercial experience in Machine Learning Engineering, Software Engineering, or High-Performance Computing environments.
- Expert-level C++ development skills and deep understanding of low-level systems programming.
- Extensive experience with CUDA, GPU programming, and performance optimization.
- Proven track record optimizing GPU utilization, memory management, kernel performance, and distributed compute workloads.
- Experience profiling and debugging performance-critical applications.
- Background in quantitative finance, trading, scientific computing, AI infrastructure, or other high-performance environments is highly desirable.
- Strong understanding of parallel computing, computer architecture, and modern ML frameworks.
Nice to Have
- Experience within HFT, quantitative trading, or low-latency systems.
- Familiarity with distributed training frameworks and large-scale ML infrastructure.
- Knowledge of FPGA, networking, or hardware acceleration technologies.
About the company
Our client is a leading quantitative trading firm seeking a Senior Machine Learning Engineer to build and optimize next-generation ML infrastructure powering ultra-low-latency trading systems. This role sits at the intersection of machine learning, high-performance computing, and systems engineering, with a strong focus on GPU acceleration and performance optimization.