AI Engineer
Role details
Job location
Tech stack
Job description
At Ruby Labs we are looking for a Senior AI Engineer to own and drive the quality, reliability, and evolution of our AI systems in production.
This is a high-ownership role. You will be responsible for end-to-end delivery of major AI features, production stability of AI systems, and data-driven experimentation using tools like Langfuse, Mixpanel and OpenRouter. You'll work in a modern stack built on Next.js, TypeScript, Node.js, and Redis, collaborating closely with product, growth, data, and billing teams. Increasingly, this includes building agentic, tool-using AI systems - defining clean tool contracts (including MCP-based tools) and orchestrating how AI interacts with internal services and business systems.
Our engineering organization uses a squad-based structure. You will operate within an AI engineering squad, contributing as a senior technical voice and driving engineering quality within your area of the product., * Take complete ownership and deliver major AI engineering features within agreed timelines.
- Own AI output quality, structure, and predictability across all user-facing AI interactions.
- Design, implement, and maintain output-type-based AI systems, including segmentation, routing, and enforcement.
- Ensure consistent output structure and formatting across different LLMs for the same request type.
- Integrate and orchestrate multiple LLM providers via OpenRouter, managing model selection, fallback strategies, and cost optimisations.
- Design and orchestrate tool-using and agentic AI workflows, defining clean tool contracts (including MCP-based tools), function-calling interfaces, and reliable AI-to-system integrations.
- Build and maintain complex, multi-step LLM workflows, including with orchestration frameworks such as LangChain or LlamaIndex, for advanced reasoning, context reuse, and retrieval.
- Design and manage production prompt systems with dynamic prompting, context injection, and conditional logic.
- Own the deployment and release of LLM experiments, prompt management, and Langfuse-based evaluation pipelines.
- Run A/B tests across models, analyse results, and present data-driven impact assessments of AI features and experiments.
- Monitor AI system metrics, quality signals, latency, and release health using Langfuse and other observability tools.
- Deep-debug complex LLM chains using Langfuse traces, identifying bottlenecks and optimising for cost, latency, and context-window usage, and build output-scoring systems to root-cause hallucinations and logic errors.
- Write clean, scalable, and maintainable TypeScript code across the Next.js and Node.js stack.
- Build reliable backend logic for AI systems, with strong error handling, request validation, fallback flows, and predictable behaviour in production, including reliable tool execution and AI-to-service integrations.
- Ensure high code quality through testing, code reviews, and clear engineering standards.
- Monitor, troubleshoot, and improve production performance, reliability, and system health.
- Drive maintainability and technical quality through solid architecture, refactoring, and disciplined release practices.
Requirements
Do you have experience in TypeScript?, * 6+ years of backend/full-stack software engineering experience, including production-grade TypeScript/Node.js. Experience with Next.js and/or Python is a plus.
- 2+ years of experience building AI/LLM systems in production. Less experience may be considered for exceptional candidates.
- Deep hands-on experience working with LLM APIs (OpenAI, Anthropic, or similar) in production environments.
- Experience with Agentic AI, multi-agent orchestration, tool-based workflows (function calling/tool execution), and/or RAG pipelines, including indexing, retrieval, and re-ranking.
- Experience with LLM observability tools such as Langfuse, LangSmith, or similar platforms.
- Experience with AI gateways and model routing solutions, such as OpenRouter or equivalent technologies.
- Solid understanding of Redis and relational databases, such as PostgreSQL.
- Exceptional ownership mindset and personal responsibility for engineering quality and delivery., * Experience with AI-centered development tools such as Cursor, Claude Code, Windsurf, or similar platforms.
- Familiarity with evaluation frameworks, including LLM-as-a-judge, RAGAS, or similar approaches.
- Experience working in high-pressure startup environments with rapid product iteration cycles.
- Experience with MCP (Model Context Protocol), including building MCP servers/clients or designing tool contracts for AI agents.
- Experience with edge and serverless runtimes, such as Cloudflare Workers, and supporting services including KV, Durable Objects, Queues, R2, and D1.
- Experience with payments, billing and checkout flows, or orchestration platforms.
- Practical experience fine-tuning models for domain-specific tasks or achieving strict JSON/schema compliance.
- Working proficiency in Python for data science, evaluation scripts, or AI tooling.
Benefits & conditions
Pulled from the full job description
- Unlimited paid time off, Discover the perks of being part of our vibrant team! We offer:
- Remote Work Environment: Embrace the freedom to work from anywhere, anytime, promoting a healthy work-life balance.
- Unlimited PTO: Enjoy unlimited paid time off to recharge and prioritize your well-being, without counting days.
- Paid National Holidays: Celebrate and relax on national holidays with paid time off to unwind and recharge.
- Company-provided MacBook: Experience seamless productivity with top-notch Apple MacBooks provided to all employees who need them.
- Flexible Independent Contractor Agreement: Unlock the benefits of flexibility, autonomy, and entrepreneurial opportunities. Benefit from tax advantages, networking opportunities, reduced employment obligations, and the freedom to work from anywhere. Read more about it here: https://docs.google.com/document/d/1tzxGX4Uu7Ts_HCLFXESKLnKaaBfVCPf1f9AYZPrkjJM/preview?tab=t.0
Be part of our fast-growing team and seize this excellent opportunity for personal and professional growth!, After submitting your application, we conduct a thorough review which typically takes 3 to 5 days, but may occasionally take longer due to the volume of applications received. If we see a potential fit, we proceed with the following steps:
- Recruiter Screening (40 minutes)
- Technical Interview (60 minutes)
- Second Interview (30 minutes)
- Final Interview (20 minutes)