Lead Machine Learning Engineer
Role details
Job location
Tech stack
Job description
We are looking for a Lead Machine Learning Engineer (SageMaker, MLOps, Explainability) to design, build, and productionise machine learning models that power our new matching platform. You will work across the full ML lifecycle-feature engineering, model development, training pipelines, deployment automation, inference optimisation, monitoring, and explainability. In this role, you will make strong hands-on technical contributions, take ownership of key components of the ML platform, and collaborate closely with data scientists, platform engineering, and product teams. You will help improve our MLOps practices, enhance observability, and ensure that our ML systems meet standards for security, performance, and compliance. This role is suited to an experienced engineer who can deliver production-grade ML systems, influence design decisions, and maintain high technical standards, while still working primarily as an individual contributor., Feature Engineering
- Build and maintain scalable feature pipelines within data lakehouse architectures.
- Develop fallback feature flows (e.g., export paths).
- Implement robust data quality checks and contribute to the adoption of feature store patterns.
Model Development & Scoring
- Develop ranking, scoring, and entity-similarity models fit for the matching platform.
- Implement calibrated confidence scores, thresholds, and model scoring logic.
- Use modern ML Model frameworks such as PyTorch, TensorFlow, or XGBoost.
- Collaborate with data scientists on model design and performance improvements.
Explainability & Reason Codes
- Apply SHAP or similar techniques to generate interpretable model explanations.
- Produce reason codes suitable for business, operational, or regulatory consumption.
- Ensure explainability outputs are versioned, tested, and integrated into inference workflows.
ML Deployment & MLOps
- Build and maintain training, processing, and inference pipelines using AWS SageMaker.
- Integrate models with model registries and implement automated deployment patterns.
- Support rollback and redeploy mechanisms for weight updates or model iterations.
- Contribute to CI/CD improvements for ML-specific workflows.
Inference Runtime & Cross-Account Serving
- Deploy and optimise low-latency, real-time inference endpoints.
- Implement secure AWS IAM patterns supporting cross-account inference access.
- Build telemetry for request logging, performance tracking, and latency monitoring.
- Solve runtime issues and optimise throughput and cost.
Monitoring, Drift Detection & Telemetry
- Implement feature drift and concept drift monitoring.
- Build dashboards, alerts, and critical performance metrics to detect model degradation.
- Develop telemetry and logging frameworks that respect data minimisation principles.
Security, Compliance & ML Governance
- Apply procedures for data handling, encryption, PII minimisation, and auditability.
- Produce Model Cards, documentation, and lineage artefacts needed for governance.
- Ensure that ML pipelines meet internal standards for reproducibility and traceability.
Testing, Validation & Performance
- Conduct validation of models using golden datasets, baseline tests, and regression testing.
- Optimise models for latency-sensitive inference paths.
- Support A/B tests, shadow deployments, and progressive rollout strategies.
Requirements
Do you have a Bachelor's degree?, * Strong experience delivering production ML systems end-to-end.
- Proficiency with AWS SageMaker (training jobs, processing, endpoints, Model Registry).
- Excellent Python skills and experience with ML Models such as PyTorch, TensorFlow, or XGBoost.
- Hands-on experience with model explainability tools such as SHAP.
- Understanding of low-latency, real-time inference patterns and optimisation techniques.
- Experience implementing drift detection, monitoring, and telemetry.
- Working knowledge of ML governance, data privacy, and secure ML practices.
- Strong understanding of MLOps, CI/CD, and automation for ML workflows.
Nice to Have
- Experience working with feature stores or Lakehouse data architectures.
- Previous experience with ranking, matching, or similarity models.
- Familiarity with cross-account AWS IAM patterns and multi-account design.
- Bachelors in a STEM subject, e.g. mathematics, physics, engineering, computer science, or adjacent degrees.