AI Solution Architect
Role details
Job location
Tech stack
Job description
Lead AI discovery and use case prioritization, scoring opportunities on data sensitivity, cost at scale, latency, feasibility, and governance exposure. Design end-to-end architectures spanning RAG, agentic workflows, data pipelines, model serving, and guardrails. Make model-selection recommendations across closed (GPT, Claude, Gemini) and open-weight (Llama, Mistral, Qwen) options, applying a structured hard-attribute / soft-attribute framework. Choose the deployment target deliberately: Azure, AWS, on-prem NVIDIA AI factory, or hybrid, and document the rationale. Define interface specifications between AgreeYa's application layer and partner-owned infrastructure (for example, model-serving endpoint contracts, performance baselines), protecting against handoff and dependency risk. Own the application-level governance design: NIST AI RMF alignment, risk tiering, human-in-the-loop placement, audit and explainability requirements. Set delivery standards and review the work of AI Engineers and MLOps Engineers for architectural soundness.
Requirements
Demonstrated production AI/ML solution architecture, not only pilots and proofs of concept. Deep RAG fluency: chunking strategy, embedding model selection, vector search, retrieval evaluation. Working knowledge of agentic patterns, orchestration (LangChain / LangGraph), and tool integration (including MCP). Strong grasp of model selection, fine-tuning vs RAG trade-offs, and inference cost/latency economics. Able to lead technical client conversations and defend design decisions to a skeptical technical audience.
Must be able to architect and reason fluently across all three of the following, and recommend between them: Azure AI: Azure AI Foundry, Azure OpenAI Service, Azure AI Search, Azure ML. AWS AI: Amazon Bedrock, SageMaker, OpenSearch, Lambda-based serving. On-prem NVIDIA AI factory: NVIDIA AI Enterprise (NVAIE), NIM microservices, Triton Inference Server, NeMo and NeMo Guardrails, Run:ai, TensorRT-LLM, and quantized/air-gapped deployment (GGUF, vLLM).