Lead Engineer - Agentic AI Platform (AWS, Bedrock, Multi-Tenant Control Plane)

CloudiQS
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 106K

Job location

Remote

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Automation of Tests
Software as a Service
Amazon DynamoDB
Identity and Access Management
Python
Node.js
Next.js
Systems Integration
TypeScript
Management of Software Versions
Data Logging
Data Ingestion
React
Large Language Models
Backend
Amazon Web Services (AWS)
Machine Learning Operations
Front End Software Development
Api Design
Cloudwatch
Legacy Systems
Microservices

Job description

CloudiQS is exiting that we are building a next-generation agentic AI control plane for enterprises and consulting partners. This platform will allow organisations to:

  • Build, configure, and version AI agents
  • Integrate data sources, tools, and AWS services
  • Deploy agents into enterprise AWS accounts
  • Monitor cost, performance, compliance, and behaviour
  • Manage agent lifecycle end-to-end (design * configure * test * deploy * monitor)

You will be the technical owner and Lead Engineer responsible for architecting and building the platform from the ground up.

This is NOT a basic chatbot builder - this is a full enterprise-grade AgentOps platform, aligned with AWS Bedrock AgentCore, multi-provider LLM routing, secure multi-tenant deployments, and deep observability into agent cost and behaviour., * Design and implement the multi-tenant agent control plane on AWS

  • Build backend microservices for:
  • Agent creation wizard
  • Agent Studio (configuration, prompts, tools, data, memory, security, grounding)
  • Model selection & cost optimisation
  • Data ingestion pipelines (S3, DynamoDB, RDS, Connectors)
  • Deployment engine (ECS/Lambda/EKS)
  • Monitoring, logging, and audit events
  • Implement LLM Gateway / LiteLLM for multi-provider routing (Bedrock, OpenAI, Anthropic)
  1. Frontend / UX
  • Build clean, modular, React-based interface with:
  • Workspace + agent library
  • Agent creation wizard (multi-step, AI-assisted)
  • Agent Studio tabs (Model, Knowledge, Data Sources, Tools, Memory, Security, Advanced)
  • Test & validation playground
  • Deployment dashboard
  • Monitoring dashboard (tokens, cost, latency, failures)
  1. AWS Integration

You must be deeply experienced with:

  • Amazon Bedrock (models, guardrails, knowledge bases, agents)
  • IAM & KMS for granular agent permissions
  • Lambda, Step Functions, ECS for runtime orchestration
  • DynamoDB, S3, RDS for agent data + metadata
  • CloudWatch & CloudTrail for observability and auditing
  • Cognito for workspace identity management

Bonus:

  • Experience implementing multi-provider model gateways
  • Experience integrating LiteLLM or similar (smart routing, cost controls)
  1. Agent Lifecycle (Core Capability)

You will design and implement:

  • Agent versioning
  • Rollback workflow
  • Deployment approval workflow
  • Automated testing:
  • latency, correctness, grounding, guardrail hits
  • Compliance and audit reporting
  • Cost optimisation logic
  1. Security & Compliance

Build enterprise-grade security features:

  • PII detection
  • Guardrails configuration
  • IAM policy generation
  • Encryption at rest and transit
  • Audit logs, activity monitoring
  • Regional data residency, As the Lead Engineer, you will help design and deliver:
  • An internal control plane for building and shipping agents
  • A marketplace-ready platform that customers can deploy inside their own AWS account
  • A partner-ready system for AWS and consulting partners

This is an opportunity to build one of the most advanced agent lifecycle and deployment platforms on AWS today.

Why Work With Us?

  • You own the architecture
  • You work directly with founders
  • No legacy systems - greenfield AI platform
  • Massive market demand (enterprise AI agents + governance)
  • Opportunity to build a flagship product in the Agentic AI ecosystem

Requirements

Do you have experience in TypeScript?, Do you have a Master's degree?, Required Skills: Technical (Must-Have)

Expert in AWS architecture (5+ years) Deep knowledge of Amazon Bedrock Experience with LLM systems, agents, RAG, memory Strong backend engineering with Python or Node.js Frontend capability (React, Next.js, TypeScript) Experience building enterprise SaaS platforms Strong understanding of:

  • Multi-tenant architectures
  • Authorisation models
  • Event-driven systems
  • API design
  • Observability & monitoring

Nice to Have

  • Experience with LangChain, CrewAI, AgentCore, or similar agent frameworks
  • Experience in cost optimisation for AI workloads
  • Knowledge of MLOps or AIOps pipelines
  • Worked with consulting teams or enterprise customers

Apply for this position