Software Engineer (Cloud Infrastructure)
Altruist Corp
Los Angeles, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 225KJob location
Remote
Los Angeles, United States of America
Tech stack
Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Computing Platforms
Bash
Cloud Computing
Code Review
Computer Security
Continuous Integration
Data Infrastructure
Cursor (Graphical User Interface Elements)
Software Design Patterns
Linux
DevOps
Programming Tools
Disaster Recovery
DNS
Amazon DynamoDB
Github
Identity and Access Management
Virtual Private Networks (VPN)
Python
Key Management
PostgreSQL
Machine Learning
Network Architecture
Octopus Deploy
Peering
Redis
Prometheus
Service-Oriented Architecture
Data Streaming
Systems Integration
Software Vulnerability Management
AI Infrastructure
Datadog
Policy as Code
Scripting (Bash/Python/Go/Ruby)
Load Balancing
GitHub Copilot
Amazon Web Services (AWS)
Grafana
Infrastructure as Code (IaC)
Amazon Web Services (AWS)
Data Layers
Amazon Web Services (AWS)
Kubernetes
Deployment Automation
Kafka
Machine Learning Operations
Cloudwatch
Api Gateway
Terraform
Data Pipelines
Dynatrace
Devsecops
Jenkins
Job description
- We're hiring a Senior Cloud Infrastructure Engineer to join the Cloud Infrastructure & Platform (CIN) team at Altruist
- This is a high-impact, senior individual contributor role responsible for architecting, building, and operating the AWS-based infrastructure that powers our broker-dealer and clearing platform
- You will own critical infrastructure domains end-to-end and drive technical decisions that affect the reliability, security, and scalability of systems handling real financial transactions
- As the industry evolves rapidly with generative AI and agentic workflows, we need engineers who combine deep AWS and Kubernetes expertise with the vision and initiative to define how AI/ML tools and patterns can transform infrastructure operations, developer productivity, and platform resilience
- This role carries significant technical influence - you will shape infrastructure strategy, lead complex cross-functional initiatives, and raise the bar for engineering practices across the organization
- This is not a ticket-driven infrastructure role. At the Senior-to-Staff level, we expect you to:
- Own domains, not just tasks. You will be the technical authority for critical infrastructure areas (e.g., EKS platform, observability strategy, DR architecture, or AI infrastructure) and drive their roadmap
- Define technical direction. Author architectural decision records (ADRs), propose infrastructure standards, and influence engineering-wide technology choices through design reviews and RFCs
- Lead without a title. Drive cross-team initiatives, align stakeholders, unblock other engineers, and represent infrastructure perspectives in leadership discussions and planning
- Multiply the team. Elevate the entire CIN team through mentorship, code reviews, knowledge sharing, and building reusable frameworks and golden paths that scale beyond your individual output
- Bridge infrastructure and business. Translate complex technical trade-offs into clear recommendations for engineering leadership, product teams, and compliance stakeholders
- Cloud Infrastructure Architecture & Platform Engineering
- Architect, deploy, and operate production AWS infrastructure supporting high-availability financial services workloads (EKS, MSK, RDS/Aurora PostgreSQL, OpenSearch, ElastiCache, S3, CloudFront, and more)
- Own and evolve the Infrastructure as Code (IaC) strategy using Terraform - define module standards, enforce code review practices, and drive adoption of reusable patterns across teams
- Lead Kubernetes (EKS) platform strategy, including cluster upgrades, node group architecture, Helm chart governance, service mesh evolution, and workload autoscaling policies
- Design and drive CI/CD platform improvements (GitHub Actions, ArgoCD, or similar) to enable safe, fast, and self-service deployments for application engineering teams
- Architect and validate disaster recovery (DR) strategies, including cross-region failover designs, backup automation, and leading DR simulation exercises
- Lead infrastructure design reviews and architectural discussions; ensure solutions meet scalability, security, and compliance requirements before implementation
- Reliability, Observability & Operational Excellence
- Define and drive the observability strategy across the platform (Datadog, Prometheus, Grafana, CloudWatch, OpenSearch) - including SLO/SLI frameworks, alerting standards, and distributed tracing
- Serve as a senior on-call escalation point; lead root-cause analysis on critical production incidents and drive systemic improvements through blameless post-mortems
- Own monthly resource saturation reviews and capacity planning processes; proactively identify scaling needs and present findings to engineering leadership
- Drive cloud cost optimization strategy: FinOps practices, Reserved Instances/Savings Plans analysis, vendor spend governance, and accountability frameworks across teams
- Security, Compliance & Networking
- Define and enforce security architecture standards across AWS environments: IAM policy governance, VPC design patterns, encryption strategies, secrets management (Vault, AWS Secrets Manager), and vulnerability remediation workflows
- Partner with Security, Compliance, and Audit teams to ensure infrastructure meets FINRA, SEC, SOC 2, and other regulatory requirements - and proactively identify gaps before they become findings
- Own networking architecture decisions including VPC topology, Transit Gateway strategy, VPN configurations, load balancer patterns (ALB/NLB), CDN optimization (Fastly/CloudFront), and DNS management
- AI/ML Infrastructure & Developer Productivity
- Define the strategy for evaluating, integrating, and governing AI-powered developer tools (e.g., Cursor AI, GitHub Copilot, CodeRabbit) across the engineering organization - including usage analytics, cost optimization, security review, and policy frameworks
- Architect infrastructure for AI/ML workloads: GPU-enabled compute, SageMaker endpoints, Bedrock integration, vector databases, and data pipeline orchestration
- Lead the adoption of AI-driven automation for infrastructure operations - intelligent alerting, anomaly detection, auto-remediation, and AIOps patterns - moving from exploration to production integration
- Build and champion internal platforms and golden paths that leverage generative AI to improve developer experience, reduce operational toil, and accelerate delivery velocity
- Technical Leadership & Collaboration
- Act as a trusted technical advisor to the Director of Engineering and engineering leadership on infrastructure strategy, trade-offs, and investment priorities
- Lead cross-functional initiatives spanning application engineering, data, security, and DevSecOps teams - driving alignment on complex multi-team infrastructure projects
- Represent the CIN team in architecture review boards, incident response leadership, and engineering-wide planning sessions
- Author and maintain comprehensive technical documentation, runbooks, ADRs, and operational procedures that serve as organizational knowledge assets
- Mentor senior and mid-level engineers; conduct thorough design and code reviews; actively contribute to hiring and onboarding processes for the CIN team
Requirements
- Demonstrated ability to lead disaster recovery planning, execute DR simulations, and design HA architecture patterns for mission-critical systems
- Proven experience designing and operating observability platforms (Datadog, Prometheus/Grafana, CloudWatch, OpenSearch/ELK) at organizational scale
- Excellent technical communication skills - ability to write clear ADRs, present to leadership, and translate infrastructure complexity for non-technical stakeholders
- Contributions to open-source projects, conference talks, or published technical writing
- Track record of owning and driving infrastructure initiatives end-to-end - from design and architecture through implementation, rollout, and operational excellence
- AWS certifications at the Professional or Specialty level: Solutions Architect Professional, DevOps Engineer Professional, Security Specialty, or Machine Learning Specialty
- Expert-level proficiency with Terraform, including module design, state management strategies, and establishing IaC standards for engineering teams
- 7+ years of infrastructure or platform engineering experience, including 3+ years operating at a senior or staff level
- Expert-level understanding of cloud networking (VPC architecture, Transit Gateway, peering, DNS, load balancing) and security (IAM, KMS, WAF, GuardDuty, Secrets Manager)
- Strong experience with database infrastructure and data layer architecture (Aurora PostgreSQL, RDS, ElastiCache/Redis, OpenSearch, DynamoDB)
- Proficiency with policy-as-code frameworks (OPA/Rego, Sentinel, Kyverno) for infrastructure governance and compliance automation
- Don't meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification
- Experience authoring RFCs, ADRs, or technical strategy documents that influenced engineering-wide decisions
- Strong Linux systems engineering skills and advanced scripting proficiency (Python, Bash, or Go)
- Experience with event streaming platforms at scale (Amazon MSK / Apache Kafka) including cluster operations, partition strategy, and consumer group management
- Experience in financial services, fintech, broker-dealer, or other heavily regulated industries with FINRA/SEC compliance requirements
- Extensive production experience operating Kubernetes (EKS strongly preferred) at scale - cluster lifecycle management, multi-tenancy patterns, Helm governance, and GitOps workflows
- Demonstrated experience with AI/ML infrastructure: provisioning GPU compute, SageMaker/Bedrock integration, vector databases, MLOps pipelines, or AIOps automation in production
- Proven track record of mentoring engineers and elevating team capabilities through knowledge sharing, design reviews, and tooling improvements
- Deep expertise in CI/CD platforms (GitHub Actions, ArgoCD, Jenkins) with experience designing deployment strategies for multi-service architectures
- Experience defining and executing organizational rollout strategies for AI developer tools, including governance frameworks, usage analytics, and cost management
- 5+ years of hands-on experience in cloud infrastructure engineering, with deep, production-proven expertise in AWS
- Hands-on experience with API gateway management and platform design (Kong, AWS API Gateway)
- FinOps certification or demonstrated experience leading cloud cost optimization programs at scale
- At Altruist we are dedicated to building a diverse, inclusive, and authentic workplace, so if you're excited about this role, but your past experience doesn't align perfectly with every qualification in the job description, we encourage you to apply anyways
- You may be just the right candidate for this or other roles
Benefits & conditions
- Grow your career: Altruist is a thoughtful, fast-growing company that values high achievers
- Our success is your success: Be rewarded for doing great work - competitive compensation and equity for all team members
- Take care of yourself: Health, dental, and vision coverage, available on your first day
- You're welcome here: Altruist is proud to be an equal opportunity employer. We're committed to inclusivity
- Work-life integration: We work hard and respect everyone's whole selves