Senior Cloud Architect, Delivery (GenAI)
Role details
Job location
Tech stack
Job description
- Lead the design and implementation of production-grade ML and Generative AI solutions on AWS (with awareness of multi-cloud environments).
- Act as a hands-on expert and trusted advisor for customers running AI/ML workloads at scale, from initial discovery through deployment and optimization.
- Translate complex business problems into cloud architectures that are secure, reliable, cost-efficient, and observable.
- Help evolve how DoiT uses AI/ML internally and with customers by turning one-off solutions into reusable patterns and "gravel roads" that influence the product roadmap.
- You will focus more on install base health, product adoption, proactive engagements, and account-team work.
Core - Deep Cloud Expertise
Be the trusted cloud engineer customers lean on for high-impact technical optimization work across cost, reliability, security, and performance.
Design and help implement solutions that:
-
improve cost efficiency (rightsizing, reservations/commitments, storage optimization, etc.)
-
increase reliability and resilience (HA/DR architectures, SLO/SLA-aware designs)
-
strengthen security posture (IAM, network segmentation, data protection, least-privilege)
-
reduce operational toil (automation, self-service, guardrails, policy enforcement)
-
Plan and deliver structured engagements such as Cloud Optimization Sessions, cost/efficiency/performance workshops, security posture or reliability reviews, and architecture deep dives / "well-architected" style assessments.
-
Respond to Expert Inquiry / support requests that require deep cloud engineering expertise, ensuring high-quality, well-explained resolutions.
-
Bring domain depth in:
-
ML / GenAI - deploying and operating ML/GenAI workloads (training and inference), GPU utilization, scaling, and cost control; MLOPS and integrating workloads with monitoring, logging, and FinOps; safe and efficient use of managed AI services.
Builder - Product Feedback & Contribution
Turn one-off field work into reusable assets that improve both customer outcomes and the product itself.
-
Convert one-off customer solutions into Gravel Roads - reusable patterns such as playbooks, Terraform modules, CloudFlow templates, cloud diagrams, Composer Recipes -> DCI Insights, and internal /external documentation.
-
Provide structured feedback to the DoiT Product and Engineering teams on:
-
product gaps and friction points discovered in real-world usage
-
new opportunities for automation and workload lenses within DCI
-
telemetry and tracking that would make future FDE work more efficient
Contribute directly to DCI where appropriate - from feature requests and feedback, to contributing code, to owning specific DCI features end-to-end. Build agent skills, scripts, and internal tooling that codify your expertise and scale it across the team. Contribute to internal enablement: share learnings via documentation, demos, office hours, or training sessions for other FDEs and Customer Success team members. Account Team - Embedded Execution
Operate as an embedded technical partner inside the account team.
- Work in the account team model alongside Customer Success Managers (CSMs), Account Managers (AMs) to deliver impactful outcomes.
- Own the technical depth lane: technical deployment & integration, automation & platform adoption, signal-based proactive engagement, and most importantly, repeatable Cloud Optimization solutions.
- Partner with customers' engineers, architects, and FinOps teams to translate vague pain points into concrete technical optimization plans - and help them ship changes that stick and create continuous value.
- Co-deliver complex or multi-domain engagements with peer FDEs (for example, infra + data + ML/GenAI), reviewing and refining designs, and engagement plans together.
- Communicate complex technical topics clearly to both engineers and non-technical stakeholders (FinOps, finance, leadership), and maintain clear documentation of architectures, decisions, and implemented changes so customers and fellow FDEs can sustain and build on your work.
- Contribute to a culture of continuous improvement within the global FDE community through design reviews, internal forums, enablement sessions, and experimentation.
Product Expert - DoiT Cloud Intelligence (DCI)
Become an expert in DCI and use it hands-on to drive concrete customer outcomes.
-
Master DoiT Cloud Intelligence products and services - including Cloud Analytics, DCI Insights, Cloud Composer, CloudFlow, DataHub, PerfectScale, and other Enterprise Platforms.
-
Use DCI hands-on to:
-
Build and operationalize Cloud Analytics and Allocations to create dashboards and reports for customer engineering, finance, and leadership.
-
Use DCI Insights to identify and prioritize cost, risk, and reliability opportunities, and shepherd them through to closure.
-
Implement Cloud Composer queries, build recipes that result in hand-crafted insights across all customers' engineering use cases.
-
Build CloudFlow automations (e.g., anomaly routing, scheduled actions, guardrails, policy enforcement).
-
Use Built in Integrations such and utilize DataHub and other workload-intelligence features to optimize key business and workload data inside DCI.
Help customers embed DCI into existing observability, CI/CD, and governance processes so it becomes trusted and indispensable in day-to-day cloud operations.
Requirements
- 4+ years of experience architecting, deploying, and managing cloud-based AI/ML solutions, including production workloads.
- Proven track record designing and operating large, distributed systems on AWS, selecting appropriate services and patterns to meet business and technical goals.
AWS & GenAI / ML Expertise
- Advanced proficiency with AWS services relevant to AI/ML and GenAI.
- Hands-on experience with Amazon Bedrock for deploying and scaling foundation models and Generative AI workloads.
- Experience fine-tuning and deploying Large Language Models (LLMs) and multimodal AI using Amazon SageMaker (including JumpStart).
- Strong prompt engineering skills and familiarity with rigorous model evaluation (quality, safety, performance).
- Understanding of agentic capabilities and patterns for AI agents that autonomously perform tasks and integrate with existing systems.
- Experience with Amazon Q Business and Amazon Q Developer (or similar tools) to accelerate insight generation and development workflows.
ML Pipelines, Data & MLOps
- In-depth knowledge of Amazon SageMaker components such as Pipelines, Model Monitor, Data Wrangler, and SageMaker Clarify for bias detection and interpretability.
- Proficiency integrating TensorFlow, PyTorch, and other ML frameworks with SageMaker for model development, fine-tuning, and deployment.
- Experience with distributed training (multi-GPU or multi-node) and performance optimization for inference.
- Strong data-engineering skills on AWS: Amazon S3, AWS Glue, Lake Formation, Redshift for AI/ML data pipelines.
- Experience building end-to-end AI/ML workflows using services like AWS Lambda, Step Functions, API Gateway, and containerized deployments on Amazon EKS / AWS Fargate.
DevOps, MLOps, Governance & Security
- Hands-on experience with CI/CD for AI/ML using AWS CodePipeline, CodeBuild, SageMaker Pipelines, or similar.
- Proficiency in monitoring and operating AI systems using Amazon CloudWatch and SageMaker Model Monitor.
- Strong understanding of AI governance, security, and compliance on AWS, including IAM, KMS, and data privacy patterns.
- Familiarity with AI ethics and bias detection/mitigation (e.g., using SageMaker Clarify or similar tools).
Multi-Cloud Awareness & Collaboration
- Working knowledge of Google Cloud AI tools (e.g., Vertex AI, Cloud AutoML, BigQuery ML) sufficient to reason about multi-cloud architectures and integration points.
- Proven ability to mentor peers, run enablement sessions, and collaborate across Sales, CS, and Product.
Soft Skills
- Excellent communication skills across technical and business audiences; able to simplify complex ideas and influence decisions.
- Natural ownership mentality: you escalate early, resolve fast, and own the outcome.
- Demonstrated ability to work effectively in a remote-first, global environment., * BA/BS degree in Computer Science, Mathematics, or a related technical field, or equivalent practical experience.
- Additional data or AI certifications (e.g., AWS/GCP data certifications, reputable AI/ML programs such as Stanford, Coursera, Udacity, MIT, eCornell).
Expanded AI/ML & Dev Experience
- Experience with modern RLHF, advanced fine-tuning techniques, and hybrid AI architectures.
- Familiarity with Hugging Face or similar open-source ecosystems integrated with AWS.
- Prior experience as a ML Engineer, Data Scientist, or AI-focused Architect in a consulting or SaaS environment.
Tooling & Process
- Experience with JIRA or similar tools for tracking work across delivery and product-feedback cycles.
- Exposure to Agile practices and frameworks commonly used for SaaS and cloud delivery.
Are you a Do'er?
Be your truest self. Work on your terms. Make a difference.
We are home to a global team of incredible talent who work remotely and have the flexibility to have a schedule that balances your work and home life. We embrace and support leveling up your skills professionally and personally.