FinOps Private Cloud Infrastructure Architect
Role details
Job location
Tech stack
Job description
The FinOps Private Cloud Infrastructure Architect leads the end-to-end architecture, metering, and operational governance for private cloud infrastructure supporting LLM and agentic AI workloads, including GPU and accelerated compute platforms. This role is accountable for ensuring accurate internal cloud usage metering, cost transparency, observability, and retention governance across a hybrid data center environment. The architect owns the Platform API Inventory and Collection Interval Validation Matrix across the AI ecosystem, ensuring all required telemetry is inventoried, validated, and collected at correct intervals to meet FinOps, security, reliability, auditability, and regulatory requirements. The role also brings hands-on FinOps experience within a large financial organization and owns the per platform telemetry retention audit-a critical enabler for resilience, recovery, and warm-up operational readiness following incidents, maintenance, patching, or disaster recovery events., Private Cloud & AI Infrastructure Architecture
-
Lead the architecture and governance of private cloud infrastructure supporting LLM and agentic AI platforms
-
Architect and govern GPU and accelerated compute platforms, including cluster design, scheduling, capacity planning, and lifecycle management
-
Design and operate infrastructure within a hybrid data center model, spanning private cloud, on prem virtualization, container platforms, storage, and network, * Lead the implementation of internal cloud usage metering for private cloud platforms
-
Own FinOps governance for infrastructure platforms, including: o Showback / chargeback models o Cost allocation and unit economics o Capacity and usage transparency
-
Partner with Finance and Engineering to align infrastructure cost models with business consumption, * Own the Platform API Inventory and Collection Interval Validation Matrix
-
Ensure all platform, infrastructure, observability, and cost telemetry APIs are: o Properly inventoried o Actively validated o Collected at correct intervals
-
Govern telemetry coverage across: o Metrics o Logs o Traces o Billing and cost data o Capacity signals o Model-serving and AI platform telemetry
-
Ensure telemetry programs meet security, audit, risk, and reliability standards, * Own per-platform telemetry retention audits, including data availability and completeness
-
Ensure retention policies support: o Incident investigation o Compliance and audit requirements o Capacity and cost analysis o Warm-up recovery design, enabling rapid restoration of operational readiness after outages, upgrades, or DR events
-
Partner with resilience and recovery teams to validate operational dependencies and recovery paths, * Partner with Engineering, Platform, Finance, Risk, Security, and Operations teams
-
Serve as the authoritative architectural voice for private cloud FinOps and AI infrastructure telemetry
-
Communicate architectural decisions, risks, and trade-offs clearly to senior stakeholders
Requirements
-
10+ years of experience in infrastructure architecture, platform engineering, or private cloud engineering within large-scale enterprise environments
-
Demonstrated experience designing and operating hybrid data center infrastructure
-
Hands-on experience with GPU platforms and accelerated compute operations
-
Proven ownership of observability and telemetry programs, including: o API inventory and validation o Metrics, logs, and traces strategy o Collection interval tuning o Data quality and reliability controls
-
Direct FinOps experience in a large organization, including infrastructure cost governance
-
Strong understanding of: o Resilience and recovery engineering o Data retention strategies o Operational readiness and warm-up dependencies
-
Excellent stakeholder management and ability to influence across engineering, finance, and risk organizations, * FinOps Certified Practitioner or FinOps Certified Professional
Benefits & conditions
Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.