Founding Engineer, Agent Systems

HelmGuard Technologies, Inc.
Charing Cross, United Kingdom
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior

Job location

Charing Cross, United Kingdom

Tech stack

API
Artificial Intelligence
Regression Testing
TypeScript
Management of Software Versions
Large Language Models
Multi-Agent Systems

Job description

We're building agent-native risk infrastructure: risk management and trust building, delivered by AI agents, for a world increasingly run by them. As more decisions and transactions run through agents, the volume of risk to manage and trust to establish is growing very fast. Today both functions are fragmented, split across internal teams, point products, and outside consultants, and rebuilt from scratch whenever someone needs an answer. HelmGuard brings them onto one platform and runs them continuously: our agents sit on top of proprietary data and reassess as conditions change, rather than at fixed checkpoints. So our customers spend their time deciding and acting on what matters, not assembling the evidence to get there.

Hundreds of billions of dollars are spent across risk management and trust building annually. These funds are going to be reallocated to agent-native solutions in the next five years, and we will capture that spend.

We've grown to seven-figure revenue within months of product launch, on the back of multi-year contracts with leading enterprises in financial services, regulated technology, and healthcare. Our founders come from Palantir and academic institutions: Oxford, Stanford, and ETH. We're backed by leading UK and US institutional investors and exceptional angels from Meta, Isomorphic Labs, Palantir, SpaceXAI, and more.

We're hiring across founding-team roles for people who want outsize impact, the influence over direction and culture that comes only from joining this early, and pre-Series A equity upside.

Your Impact

We already have the best agent scaffolding and orchestration in the trust and risk space. You'll make it the best of anyone shipping enterprise agents, in any vertical. You'll embody the AI-native services thesis our customers are betting on: agents that don't assist with workflows but become the system of action for them.

The Role

You own the agent platform: the orchestration, evals, and reliability work that turns model calls into product features customers trust. The bar is not that the demo works but rather that a domain expert reading our agent's output considers it at the level of a peer. You own the technical delivery to make that possible.

This isn't a research role at its core: we consume frontier APIs and make them production-grade. We push them hard, though, hard enough that we recently found and reported a bug in the Anthropic API that took their engineers weeks to reproduce. At that level, the line between using these models and studying them gets thin, so if research-flavoured work pulls at you, there's room to follow it.

What You Will Do

  • Agent scaffolding: tool use, context management, sandboxing, prompt-injection defence
  • Evals for fuzzy, high-stakes outputs: assessments, policy interpretation, control mapping
  • Reliability infrastructure: retries, fallbacks, circuit breakers, prompt versioning
  • The internal standard for what "good enough to ship" means for AI features here, Perks. Daily team lunch and specialty coffee, a roof terrace overlooking King's Cross, on-site showers for those who enjoy active commuting, and serious per-engineer AI tooling and API budgets.

Requirements

Do you have experience in TypeScript?, Do you have a Master's degree?, * Experience with backend engineering in TypeScript or comparable, with 1-2+ years shipping production LLM features

  • Experience with agent frameworks, tool calling, and multi-step orchestration
  • Production evals chops: dataset curation, LLM-as-judge failure modes, regression testing under model swaps
  • Strong systems thinking: async, queues, idempotency
  • Comfort being the named owner of AI quality, including saying no when needed

Nice to have

  • Anthropic, OpenAI, or open-weight APIs in production at scale
  • Prompt-injection or agent-security work
  • Background in compliance, audit, or any domain where correctness is fuzzy and stakes are high

Culture and Values

About the company

Tech Stack. TypeScript, Node.js, React, Tailwind, OpenAPI, Express, Azure (Container Apps, Service Bus, Front Door, Entra ID), Postgres, Terraform, GitHub Actions, Docker. Anthropic-first AI with in-house evals and scaffolding. Claude Code throughout.

Apply for this position