Deep Learning Engineer-Model Compression (Fixed-term contract)

Multiverse Computing S.L

8 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Computer Vision

Azure

Software Quality

Code Review

Software Debugging

Python

Machine Learning

Open Source Technology

Recommender Systems

Speech Recognition

Graphics Processing Unit (GPU)

PyTorch

Large Language Models

Prompt Engineering

Deep Learning

Information Technology

Optimization Algorithms

HuggingFace

Software Version Control

Data Pipelines

Job description

We are seeking a skilled and experienced Deep Learning Engineer (Senior and Mid-level) with a strong background in deep learning to join our team. In this role you will have the opportunity to leverage cutting-edge quantum and AI technologies to lead the design, implementation, and improvement of our computer vision and language models, as well as working closely with cross-functional teams to integrate these models into our products. You will have the opportunity to work on challenging projects, contribute to cutting-edge research, and shape the future of LLM and AI technologies.

As a Deep Learning Engineer for Model Compression, you will

Design, train, and optimize deep learning models from scratch (including LLMs and computer vision models), working end-to-end across data preparation, architecture design, training loops, distributed compute, and evaluation.
Apply and further develop state-of-the-art model compression techniques, including pruning (structured/unstructured), distillation, low-rank decomposition, quantization (PTQ/QAT), and architecture-level slimming.
Build reproducible pipelines for large-model compression, integrating training, re-training, search/ablation loops, and evaluation into automated workflows.
Design and implement strategies for creating, sourcing, and augmenting datasets tailored for LLM pre-training and post-training, and computer vision models.
Fine-tune and adapt language models using methods such as SFT, prompt engineering, and reinforcement or preference optimization, tailoring them to domain-specific tasks and real-world constraints.
Conduct rigorous empirical studies to understand trade-offs between accuracy, latency, memory footprint, throughput, cost, and hardware constraints across GPU, CPU, and edge devices.
Benchmark compressed models end-to-end, including task performance, robustness, generalization, and degradation analysis across real-world workloads and business use cases.
Perform deep error analysis and structured ablations to identify failure modes introduced by compression, guiding improvements in architecture, training strategy, or data curation.
Design experiments that combine compression, retrieval, and downstream finetuning, exploring the interaction between model size, retrieval strategies, and task-level performance in RAG and Agentic AI systems.
Optimize models for cloud and edge deployment, adapting compression strategies to hardware constraints, performance targets, and cost budgets.
Integrate compressed models seamlessly into production pipelines and customer-facing systems.
Maintain high engineering standards, ensuring clear documentation, versioned experiments, reproducible results, and clean modular codebases for training and compression workflows.
Participate in code reviews, offering thoughtful, constructive feedback to maintain code quality, readability, and consistency.

Requirements

Master's or Ph.D. in Computer Science, Machine Learning, Electrical Engineering, Physics, or a related technical field.
3+ years of hands-on experience training deep learning models from scratch, including designing architectures, building data pipelines, implementing training loops, and running large-scale distributed training jobs.
Proven experience in at least one major deep learning domain where training from scratch is standard practice, such as computer vision (CNNs, ViTs), speech recognition, recommender systems (DNNs, GNNs), or large language models (LLMs).
Strong expertise with model compression techniques, including pruning (structured/unstructured), distillation, low-rank factorization, and architecture-level optimization.
Demonstrated ability to analyze and improve model performance through ablation studies, error analysis, and architecture or data-driven iterative improvements.
In-depth knowledge of foundational model architectures (computer vision and LLMs) and their lifecycle: training, fine-tuning, alignment, and evaluation.
Solid understanding of training dynamics, optimization algorithms, initialization schemes, normalization layers, and regularization methods.
Hands-on experience with Python, PyTorch and modern ML stacks (HuggingFace Transformers, Lightning, DeepSpeed, Accelerate, NeMo, or equivalent).
Experience building robust, modular, scalable ML training pipelines, including experiment tracking, reproducibility, and version control best practices.
Practical experience optimizing models for real-world deployment, including latency, memory footprint, throughput, hardware constraints, and inference-cost considerations.
Excellent problem-solving, debugging, performance analysis, test design, and documentation skills.
Excellent communication skills in English, with the ability to document and explain design decisions, experiment results, and trade-offs to both technical and non-technical stakeholders.

Preferred Qualifications

Ph.D. with research focus on efficient deep learning, model compression, sparse methods, quantization, distillation, or neural architecture search.
Demonstrated track record of open-source contributions to deep learning frameworks, compression libraries, or model efficiency tooling (e.g., PyTorch, HuggingFace, TensorRT, ONNX Runtime, Sparsity/Pruning libraries).
Strong background in distributed training, GPU acceleration, mixed-precision training, and optimization for multi-node or multi-GPU settings (AWS, Azure, HPCs).
Hands-on experience with hardware-aware model design, including optimizing models for GPUs, CPUs, mobile/edge accelerators, or specialized inference chips.
Experience implementing or extending neural architecture search (NAS) or structural pruning methods to discover efficient sub-architectures.
Publication record in top ML or systems conferences (e.g., NeurIPS, ICML, ICLR, MLSys) related to compression, efficient ML, or large-scale training.

Benefits & conditions

Competitive annual salary
Two unique bonuses: signing bonus at incorporation and retention bonus at contract completion.
Relocation package (if applicable).
Fixed-term contract ending inn June 2026.
Hybrid role and flexible working hours.
Be part of a fast-scaling Series B company at the forefront of deep tech.
Equal pay guaranteed.
International exposure in a multicultural, cutting-edge environment.

About the company

Multiverse is a well-funded and fast-growing deep-tech company founded in 2019. We are one of the few companies working with Quantum Computing and the biggest Quantum Software company in the EU.
We provide hyper-efficient software to companies wanting to gain an edge with quantum computing and artificial intelligence. Our product, Singularity, is a software platform that contains quantum and quantum-inspired algorithms developed and patented through proof-of-concept trials we have been performing for industrial and service clients. We work in finance, energy, manufacturing, cybersecurity and many more industries.

Digital methods usually fail at efficiently tackling these problems. Quantum computing, however, provides us with a powerful toolbox to tackle these complex problems, such as outstanding optimization methods, software for quantum machine learning, and quantum enhanced Monte Carlo algorithms.

Multiverse Computing applies these cutting edge methods to provide software which is customized to your needs, giving companies a chance to derive value from the second quantum revolution.