{"@context":"https://schema.org/","@type":"JobPosting","title":"C++ Engineer

Latency
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Tech stack

C++
Profiling
Data Structures
Linux
Field-Programmable Gate Array (FPGA)
Remote Direct Memory Access
Concurrency

Job description

We are partnering with one of the world's most technically ambitious proprietary trading firms - a group rebuilding their entire trading platform from the metal up to operate at the physical limits of modern hardware. This isn't an incremental improvement. It's a total re-architecture of the fastest system on the planet, where every microsecond is contested ground and every cache miss is a bug.

Environment: C++20/23 * Linux * Kernel-bypass Networking * FPGA * RDMA * Nanosecond Execution

Their engineers operate where nanoseconds decide P&L - measured, profiled, and deployed in live markets where performance is the edge.

They're now seeking an elite C++ Engineer capable of designing and optimising the core of a real-time execution platform - a system that ingests millions of market events per second and reacts deterministically, faster than anyone else on Earth., 1. Architectzero-GC, lock-free pipelines built around ring buffers and cache-aligned data structures.

  1. Developcustom kernel-bypass network stacks usingDPDK, RDMA, and Solarflare Onload, tuned to single-digit microsecond latency.
  2. Engineerbranch-prediction-aware order handlers andSIMD-vectorized pricing logic inAVX-512.
  3. Deliver next-tick telemetry withnanosecond-precision timestamps and cross-core synchronization.
  4. Collaborate withFPGA specialists to merge hardware precision with software agility.

The Toolkit

  1. ModernC++20/23, template metaprogramming, constexpr, inline assembly when necessary.
  2. Profiling and optimization usingperf,VTune,bcc, andFlameGraphs.
  3. Deep knowledge ofNUMA-aware design, memory fences, and lock-free concurrency.
  4. Expertise incustom allocator design,branchless algorithms, andprofile-guided optimization.
  5. A habit of benchmarking rather than assuming - data, not theory.

Requirements

  1. Proven experience buildingultra-low-latency systems in trading, gaming, or networking.
  2. Deep understanding ofCPU architecture, from cache hierarchies to speculative execution.
  3. The mindset of someone whothinks in nanoseconds and measures in CPU cycles.
  4. A record of winning battles with compilers, kernels, and performance bottlenecks.

Apply for this position