Machine Learning Engineer
Net2Source
Reading, United States of America
1 month ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Reading, United States of America
Tech stack
A/B testing
Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Databases
Continuous Integration
Data Governance
DevOps
Amazon DynamoDB
Identity and Access Management
Machine Learning
Large Language Models
Grafana
Multi-Agent Systems
Caching
Generative AI
GIT
Containerization
Kubernetes
Information Technology
Machine Learning Operations
Functional Programming
Dataiku
Cloudwatch
Api Gateway
Docker
Job description
- Design multi-agent architectures: define agent roles (planner, researcher, retriever, executor, reviewer), toolboxes, handoffs, memory strategy (short/long-term), and supervisor policies for safe collaboration.
- Build high-quality RAG: implement ingestion, chunking, embeddings, indexing, and retrieval with evaluation (precision/recall, groundedness, hallucination checks), guardrails, and citations.
- Productionize on AWS: leverage services like Bedrock (Agents/Knowledge Bases/Flows), Lambda, API Gateway, S3, DynamoDB, OpenSearch/Vector DB, Step Functions, and CloudWatch for tracing and alerts.
- MLOps/LLMOps: automate CI/CD (GitOps), containerization (Docker/Kubernetes), infra-as-code, secrets/IAM, blue green/rollbacks, and data/feature pipelines.
- Observability & evaluation: instrument telemetry (traces, token/cost, latency), build dashboards (Grafana/CloudWatch), add human-in-the-loop review, A/B testing, and continuous offline/online evals.
- Operate reliably at scale: implement caching, rate-limit management, queueing, idempotency, and backoff; proactively detect drift and degradation.
- Collaborate & communicate partner with infra/DevOps/data/architecture teams; document designs, SLIs/SLOs, runbooks; present status and insights to technical and non-technical stakeholders.
Requirements
- Bachelor's degree in computer science, Data Science, Engineering, or related field-or equivalent experience.
- Proven experience building agentic systems (single or multi-agent) and RAG pipelines in production.
- Strong cloud background for AI/ML workloads; familiarity with Bedrock or equivalent LLM platforms.
- Solid CI/CD and containerization skills (Git, Docker, Kubernetes) and infra-as-code fundamentals.
- Knowledge of data governance and model accountability throughout the MLOps/LLMOps lifecycle.
- Excellent communication, collaboration, and problem-solving skills; ability to work independently and within cross-functional teams.
- Passion for Generative AI and the impact of agent-based solutions across industries.
Preferred / Good to Have
- Experience with AWS Bedrock Agents/Knowledge Bases/Flows, OpenSearch (or other vector databases), Step Functions, Lambda, API Gateway, DynamoDB, S3.
- Dataiku platform exposure-govern, approvals, artifacts, MLOps deployment flows; SageMaker for custom model hosting.
- Familiarity with agent frameworks (e.g., LangGraph, crewAI, Semantic Kernel, AutoGen) and evaluation frameworks (guardrails, groundedness, hallucination checks).
- Covered these Dataiku Certifications (nice to have): ML Practitioner, Advanced Designer, MLOps Practitioner.