Principal Machine Learning Engineer - Production Systems
Role details
Job location
Tech stack
Job description
Job DescriptionPrincipal Machine Learning Engineer - Production SystemsOverviewSoftInWay UK Ltd. Is seeking a highly experienced ML Systems Architect to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack. ResponsibilitiesArchitect the ML Solver Platform:Define modular architecture for data preprocessing, model execution, and post-processing.Establish clear API contracts between Python/TensorFlow and C# services.Productionize ML Workflows:Convert research code into robust, testable, and observable services.Implement CI/CD pipelines, automated testing, and reproducibility standards.Integration & Interoperability:Design REST/gRPC endpoints for cross-language communication.Ensure compatibility with C#/.NET services.Performance & Scalability:Optimize GPU/CPU utilization, batching strategies, and memory management.Plan for multi-model and multi-tenant scenarios.MLOps & Lifecycle Management:Implement model versioning, artifact registries, and deployment workflows.Set up monitoring, logging, and alerting for solver performance.Security & Compliance:Apply best practices for secrets management, dependency scanning, and secure artifact storage.
Requirements
Required Skills & ExperienceML Frameworks: Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.Programming: Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.Architecture: Proven experience designing scalable ML systems for production.APIs: Proficiency in gRPC/Protobuf and REST for cross-language integration.MLOps: CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.Performance Optimization: GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.Observability: Metrics, tracing, structured logging, dashboards.Security: SBOM, image signing, role-based access, vulnerability scanning.Preferred QualificationsExperience with ONNX Runtime Training, PyTorch, or hybrid ML architectures.Familiarity with distributed training strategies and multi-GPU setups.Knowledge of feature stores and data validation frameworks.Exposure to regulated environments and compliance frameworks.
Benefits & conditions
Tools & TechnologiesML: TensorFlow, ONNX Runtime, tf2onnx.APIs: FastAPI, gRPC.DevOps: GitLab CI/GitHub Actions, Docker, Kubernetes.Monitoring: Prometheus, Grafana, OpenTelemetry.Security: HashiCorp Vault, Sigstore. Why Join Us?Work on cutting-edge ML solutions integrated into commercial engineering software.Define architecture that scales across global deployments.Collaborate with a team of experts in ML, software engineering, and UI development.Competitive salary and benefits. To apply: Send your resume and a brief cover letter to #####