Senior Software Engineer
Role details
Job location
Tech stack
Job description
We are a small, startup-minded team that ships fast and owns what we build end-to-end. We are looking for an SDE II who is hungry to contribute to a real production system, not a sandbox. You will work across the application and infrastructure layers, implement features that users interact with every day, and be expected to own what you build from design through deployment. You will not be handed perfectly scoped tickets. You will be expected to ask good questions, figure things out, and move. The best person for this role communicates clearly, collaborates without ego, and brings genuine empathy for the users whose work they are making better. You are a self-starter with a high bar and a high sense of urgency. You play well with others and make the people around you better. What You Will Do Build Agentic AI Systems
- Implement and iterate on our agentic workflows: tool-calling, multi-step reasoning, planning, memory, and agent-to-agent (A2A) communication patterns at the application layer
- Build and maintain MCP (Model Context Protocol) client-side integrations: how agents discover, invoke, and compose tools
- Implement tool definitions, input/output schemas, error handling, retry logic, and result formatting for GRACE's growing tool library
- Contribute to multi-agent orchestration patterns that are reliable and debuggable in production, not just in demos
Build LLM-Powered Features
- Implement LLM orchestration logic: prompt construction, context management, model selection, and response parsing across OpenAI GPT, Anthropic Claude, and Google Gemini
- Build and maintain RAG pipeline components: query formulation, result ranking, citation grounding, and hallucination mitigation
- Implement and iterate on prompt engineering patterns and system prompts that drive quality and consistency across model families
- Contribute to context window budget management: truncation, summarization, and pagination logic that makes the right call at runtime
- Build LLM evaluation components: grounding assessment, regression tests, safety checks, and quality metrics
- Write prompts and pipelines with token economics in mind; cost-per-query is a real constraint, not an afterthought
Own the Backend
- Build secure, well-tested backend features end-to-end: from application logic through to the API contract the frontend consumes
- Implement integrations with internal and external data sources and APIs, including Dimensions, Google Search, Slack, SharePoint, and LLM provider APIs
- Contribute to monitoring, logging, and distributed tracing so that failures are diagnosable and regressions are caught before users report them
- Implement fallback, retry, and graceful degradation patterns for AI service dependencies
- Write production-quality code: readable, tested, reviewed, and documented
Contribute to Infrastructure
-
Work within Microsoft Azure infrastructure: Azure Functions, Azure API Management, Azure Container Apps, and Azure OpenAI Service
-
Contribute to CI/CD pipelines, deployment automation, and release processes
-
Work with containerization tools and infrastructure as code; understand the environment your code runs in
-
Contribute to application-level SLOs: tool call success rates, response quality, and latency from the user's perspective
Collaborate and Grow
-
Participate actively in design reviews, sprint planning, and retrospectives; ask good questions and push back when something does not add up
-
Communicate technical decisions clearly to both engineers and non-engineers; no one should have to guess what you built or why
-
Work closely with the PM, researcher, designer, and senior engineers to translate ambiguous requirements into clear, actionable implementations
-
Bring genuine curiosity and empathy to every feature; understand who is using what you build and why it matters to them
-
Ensure strong privacy, security, and compliance in all systems, integrations, and data handling, You will work on a production system that real users depend on every day to do meaningful work. You will not be one of hundreds of engineers on a feature nobody uses. You will see the impact of what you build quickly, get direct feedback, and have real ownership over your work. Senior Software Development Engineer III (SDE III) Agentic AI & LLM Applications What You Will Own Agentic AI Systems & Orchestration
-
Design and build our core agentic workflows: multi-step reasoning, planning, memory, and tool-use across single and multi-agent systems
-
Implement and evolve A2A communication patterns at the application layer, enabling GRACE agents to collaborate and hand off tasks
-
Build and maintain the tool-calling layer: tool definitions, input/output schemas, error handling, retry logic, and result formatting
-
Own the MCP client-side integration: how our agents discover, invoke, and compose tools exposed via MCP servers
-
Design multi-agent workflows that are reliable, observable, and debuggable in production, not just in demos
LLM Application Development
- Own LLM orchestration at the application layer: prompt construction, context management, model selection logic, and response parsing
- Build and maintain RAG features: query formulation, result ranking, citation grounding, and hallucination mitigation
- Implement and iterate on prompt engineering patterns and system prompts that drive quality and consistency across OpenAI GPT, Anthropic Claude, and Google Gemini
- Manage context window budgets: know when to truncate, summarize, or paginate, and build the logic that makes those decisions correctly
- Build evaluation pipelines for LLM quality: grounding assessment, regression testing, safety checks, and A/B experimentation on prompt and model changes
- Stay sharp on token economics: write prompts and pipelines that are cost-efficient without sacrificing output quality
Features & Product
- Translate ambiguous product requirements into clear technical designs and ship them fast
- Build new capabilities end-to-end: from backend application logic through to the API contract the frontend consumes
- Rapidly prototype new agentic features, run experiments, collect data, and iterate based on real user behavior
- Collaborate closely with product, UX, applied science, and operations; listen well, ask good questions, and build the right thing rather than the obvious thing
- Own the quality of what you ship: write tests, handle edge cases, and make sure your features degrade gracefully when upstream dependencies fail
Observability & Reliability
-
Instrument agentic workflows with tracing, logging, and metrics so failures are diagnosable and regressions are caught before users report them
-
Define and monitor application-level SLOs: tool call success rates, response quality, and latency from the user's perspective
-
Build fallback and guardrail logic for AI services: what happens when a model returns something unsafe, off-topic, or structurally wrong
-
Work closely with the infra engineer to understand system-level constraints and design application behavior that respects them
Engineering Excellence & Team
- Write production-quality code: readable, tested, reviewed, and documented
- Communicate technical decisions clearly to both engineers and non-engineers; no one should have to guess what you decided or why
- Participate actively in design reviews; push back when something is over-engineered or under-specified
- Mentor and unblock other engineers; bias toward ownership and fast iteration
- Ensure strong privacy, security, and compliance in all application logic and data handling
Requirements
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience
- 3+ years of professional software engineering experience building and operating production systems
- Proven experience in high-velocity environments where you contributed to shipping real products end-to-end
- Strong proficiency in Python and at least one other backend language; familiarity with modern backend frameworks and async patterns
- Solid understanding of algorithms, data structures, distributed systems, and software design patterns
- Experience building and operating systems on major cloud platforms (AWS, GCP, or Azure)
- Experience with containerization (Docker) and working within CI/CD pipelines
- Clear, direct communicator who gives and receives feedback well, works with empathy, and makes the people around them better, * Hands-on experience building features on top of LLMs in production: tool-calling, RAG, multi-step reasoning, and context management
- Familiarity with A2A (Agent-to-Agent) communication patterns and multi-agent orchestration frameworks
- Familiarity with MCP at the client/consumer layer: how agents discover and invoke tools via MCP
- Working knowledge of prompt engineering and LLM behavior across model families; you understand why Claude and GPT respond differently to the same prompt
- Experience with LLM evaluation, grounding assessment, or regression testing for AI-powered systems
- Awareness of token economics at the application layer: cost-per-query, context budget management, and prompt efficiency
- Experience on Microsoft Azure: Azure Functions, API Management, Container Apps, or Azure OpenAI Service
- Familiarity with secrets management, least-privilege access, and security-conscious engineering practices
- Experience in startup or early-stage environments: comfort with ambiguity, rapid iteration, and wearing multiple hats
- Experience in healthcare, life sciences, or other regulated domains is a plus but not required, * Bachelor's or Master's in Computer Science, Software Engineering, or related field, or equivalent practical experience
- 7+ years of professional software engineering experience building and operating production systems
- Proven experience in high-velocity environments where you owned and shipped complex products end-to-end
- Strong proficiency in Python and at least one other backend language; familiarity with modern backend frameworks and async patterns
- Solid understanding of algorithms, data structures, APIs, and software design patterns
- Experience building and operating systems on major cloud platforms (AWS, GCP, or Azure)
- Experience with containerization and working within CI/CD pipelines
- Clear, direct communicator who gives and receives feedback well, works with empathy, and makes the people around them better, * Hands-on experience building production systems on top of LLMs: tool-calling, RAG, multi-step reasoning, and context management
- Experience with multi-agent (A2A) architectures and orchestration frameworks in production, not just in prototypes
- Familiarity with MCP at the client/consumer layer: how agents discover and invoke tools via MCP
- Strong intuition for prompt engineering and LLM behavior across model families; you know why Claude and GPT respond differently to the same prompt and you design for it
- Experience building LLM evaluation and regression testing pipelines
- Demonstrated understanding of token economics: cost-per-query awareness, context budget management, and prompt efficiency
- Track record in startup or early-stage environments: 0-to-1 product building, comfort with ambiguity, high sense of urgency
- Experience in big tech building customer-facing AI platforms or developer tools at scale
- Background in security-conscious engineering: input validation, output sanitization, audit logging, and responsible AI guardrails
- Experience in healthcare, life sciences, or other regulated domains is a plus but not required