Principal Machine Learning Engineer - Production Systems

Production Systemsoverviewsoftinway Uk Ltd.

Bradley Stoke, United Kingdom

5 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Bradley Stoke, United Kingdom

Tech stack

.NET

API

Automation of Tests

C Sharp (Programming Language)

Nvidia CUDA

Data Validation

Protocol Buffers

Interoperability

Python

Machine Learning

TensorFlow

Management of Software Versions

Data Logging

PyTorch

Delivery Pipeline

Keras

Containerization

Kubernetes

Docker

Job description

Job DescriptionPrincipal Machine Learning Engineer - Production SystemsOverviewSoftInWay UK Ltd. Is seeking a highly experienced ML Systems Architect to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack. ResponsibilitiesArchitect the ML Solver Platform:Define modular architecture for data preprocessing, model execution, and post-processing.Establish clear API contracts between Python/TensorFlow and C# services.Productionize ML Workflows:Convert research code into robust, testable, and observable services.Implement CI/CD pipelines, automated testing, and reproducibility standards.Integration & Interoperability:Design REST/gRPC endpoints for cross-language communication.Ensure compatibility with C#/.NET services.Performance & Scalability:Optimize GPU/CPU utilization, batching strategies, and memory management.Plan for multi-model and multi-tenant scenarios.MLOps & Lifecycle Management:Implement model versioning, artifact registries, and deployment workflows.Set up monitoring, logging, and alerting for solver performance.Security & Compliance:Apply best practices for secrets management, dependency scanning, and secure artifact storage.

Requirements

Required Skills & ExperienceML Frameworks: Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.Programming: Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.Architecture: Proven experience designing scalable ML systems for production.APIs: Proficiency in gRPC/Protobuf and REST for cross-language integration.MLOps: CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.Performance Optimization: GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.Observability: Metrics, tracing, structured logging, dashboards.Security: SBOM, image signing, role-based access, vulnerability scanning.Preferred QualificationsExperience with ONNX Runtime Training, PyTorch, or hybrid ML architectures.Familiarity with distributed training strategies and multi-GPU setups.Knowledge of feature stores and data validation frameworks.Exposure to regulated environments and compliance frameworks.

Benefits & conditions

Tools & TechnologiesML: TensorFlow, ONNX Runtime, tf2onnx.APIs: FastAPI, gRPC.DevOps: GitLab CI/GitHub Actions, Docker, Kubernetes.Monitoring: Prometheus, Grafana, OpenTelemetry.Security: HashiCorp Vault, Sigstore. Why Join Us?Work on cutting-edge ML solutions integrated into commercial engineering software.Define architecture that scales across global deployments.Collaborate with a team of experts in ML, software engineering, and UI development.Competitive salary and benefits. To apply: Send your resume and a brief cover letter to #####

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all