AI & Embedded ML Engineer (Real-Time Edge Optimization)

autonomous-teaming

München, Germany

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

München, Germany

Tech stack

Artificial Intelligence

Artificial Neural Networks

C++

Profiling

Nvidia CUDA

Software Debugging

Linux

Field-Programmable Gate Array (FPGA)

Global Positioning Systems (GPS)

Python

Memory Leaks

Robotic Automation Software

Graphics Processing Unit (GPU)

PyTorch

GIT

Perf (Linux)

Job description

At Autonomous Teaming, we build autonomous robotic systems operating in extreme, GPS-denied environments. Our models run fully on edge hardware (Jetson, FPGA, custom boards), with no cloud, no fallback, no excuses. We're looking for an engineer who loves hard problems : real-time inference, low-latency pipelines, CUDA kernels, TensorRT graphs, and deploying ML models directly on hardware. If you enjoy debugging things that only break on the robot, this role is for you.

Missions : Own the full pipeline from model to real-time inference on embedded devices:

Optimize deep neural networks for Jetson, FPGA or ARM boards
Apply quantization, pruning, distillation to hit strict FPS, power and memory budgets
Convert & compile models using TensorRT, ONNX, CUDA, C++
Build ROS nodes integrating optimized perception into the full robotic system
Debug runtime failures, memory leaks, thermal throttling, kernel-level issues
Benchmark and validate performance directly on hardware
Ship models that run reliably in real-world, harsh environments, * You work on constrained hardware, where every millisecond and every watt matters
You solve problems that cloud ML engineers never face
You own your optimizations end-to-end : from model to field deployment
You work in a small, high-performing team where ownership is real

If you want a job with clean layers and abstract diagrams, this is not it.

Requirements

Do you have experience in Python?, * Strong experience in CUDA & C++

Hands-on work with TensorRT, ONNX, TVM or similar compilers
Practical experience with quantization/ pruning/ INT8 / FP16
Experience deploying models on Jetson/ embedded GPUs/ ARM / FPGA
Comfortable with PyTorch, Python, Linux, Git Engineer mindset : measurement optimization
validation

Nice-to-have

ROS (building nodes, integrating perception stacks)
Custom accelerators, DSPs or hardware-specific toolchains
Profilers : Nsight, perf, tegrastats, TensorRT profiler
Experience in robotics , autonomous systems, aerospace, automotive or defense

About the company

We are a defence-tech start-up specializing in machine vision solutions. If you have a passion for cutting-edge innovation, and drive to use your skills to create next generation solutions, this is an opportunity for you! What we do: We are developing solutions that enable computers and sensors to collaborate as teams, working together to address emerging security challenges. Our primary mission is to defend against AI-powered asymmetric threats at scale, such as drone swarms and other UXVs. Who we are: Based in Munich, Berlin and Bordeaux/Toulouse we are rapidly expanding across Europe with plans to open more office hubs soon. We embrace a hybrid work culture - valuing the collaborations that happens in the office, while also empowering our team members to work remotely with responsibility and autonomy.