Software Engineer (Cloud Infrastructure)

Altruist Corp

Los Angeles, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 225K

Job location

Remote

Los Angeles, United States of America

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Data analysis

Computing Platforms

Bash

Cloud Computing

Code Review

Computer Security

Continuous Integration

Data Infrastructure

Cursor (Graphical User Interface Elements)

Software Design Patterns

Linux

DevOps

Programming Tools

Disaster Recovery

DNS

Amazon DynamoDB

Github

Identity and Access Management

Virtual Private Networks (VPN)

Python

Key Management

PostgreSQL

Machine Learning

Network Architecture

Octopus Deploy

Peering

Redis

Prometheus

Service-Oriented Architecture

Data Streaming

Systems Integration

Software Vulnerability Management

AI Infrastructure

Datadog

Policy as Code

Scripting (Bash/Python/Go/Ruby)

Load Balancing

GitHub Copilot

Amazon Web Services (AWS)

Grafana

Infrastructure as Code (IaC)

Amazon Web Services (AWS)

Data Layers

Amazon Web Services (AWS)

Kubernetes

Deployment Automation

Kafka

Machine Learning Operations

Cloudwatch

Api Gateway

Terraform

Data Pipelines

Dynatrace

Devsecops

Jenkins

Job description

We're hiring a Senior Cloud Infrastructure Engineer to join the Cloud Infrastructure & Platform (CIN) team at Altruist
This is a high-impact, senior individual contributor role responsible for architecting, building, and operating the AWS-based infrastructure that powers our broker-dealer and clearing platform
You will own critical infrastructure domains end-to-end and drive technical decisions that affect the reliability, security, and scalability of systems handling real financial transactions
As the industry evolves rapidly with generative AI and agentic workflows, we need engineers who combine deep AWS and Kubernetes expertise with the vision and initiative to define how AI/ML tools and patterns can transform infrastructure operations, developer productivity, and platform resilience
This role carries significant technical influence - you will shape infrastructure strategy, lead complex cross-functional initiatives, and raise the bar for engineering practices across the organization
This is not a ticket-driven infrastructure role. At the Senior-to-Staff level, we expect you to:
Own domains, not just tasks. You will be the technical authority for critical infrastructure areas (e.g., EKS platform, observability strategy, DR architecture, or AI infrastructure) and drive their roadmap
Define technical direction. Author architectural decision records (ADRs), propose infrastructure standards, and influence engineering-wide technology choices through design reviews and RFCs
Lead without a title. Drive cross-team initiatives, align stakeholders, unblock other engineers, and represent infrastructure perspectives in leadership discussions and planning
Multiply the team. Elevate the entire CIN team through mentorship, code reviews, knowledge sharing, and building reusable frameworks and golden paths that scale beyond your individual output
Bridge infrastructure and business. Translate complex technical trade-offs into clear recommendations for engineering leadership, product teams, and compliance stakeholders
Cloud Infrastructure Architecture & Platform Engineering
Architect, deploy, and operate production AWS infrastructure supporting high-availability financial services workloads (EKS, MSK, RDS/Aurora PostgreSQL, OpenSearch, ElastiCache, S3, CloudFront, and more)
Own and evolve the Infrastructure as Code (IaC) strategy using Terraform - define module standards, enforce code review practices, and drive adoption of reusable patterns across teams
Lead Kubernetes (EKS) platform strategy, including cluster upgrades, node group architecture, Helm chart governance, service mesh evolution, and workload autoscaling policies
Design and drive CI/CD platform improvements (GitHub Actions, ArgoCD, or similar) to enable safe, fast, and self-service deployments for application engineering teams
Architect and validate disaster recovery (DR) strategies, including cross-region failover designs, backup automation, and leading DR simulation exercises
Lead infrastructure design reviews and architectural discussions; ensure solutions meet scalability, security, and compliance requirements before implementation
Reliability, Observability & Operational Excellence
Define and drive the observability strategy across the platform (Datadog, Prometheus, Grafana, CloudWatch, OpenSearch) - including SLO/SLI frameworks, alerting standards, and distributed tracing
Serve as a senior on-call escalation point; lead root-cause analysis on critical production incidents and drive systemic improvements through blameless post-mortems
Own monthly resource saturation reviews and capacity planning processes; proactively identify scaling needs and present findings to engineering leadership
Drive cloud cost optimization strategy: FinOps practices, Reserved Instances/Savings Plans analysis, vendor spend governance, and accountability frameworks across teams
Security, Compliance & Networking
Define and enforce security architecture standards across AWS environments: IAM policy governance, VPC design patterns, encryption strategies, secrets management (Vault, AWS Secrets Manager), and vulnerability remediation workflows
Partner with Security, Compliance, and Audit teams to ensure infrastructure meets FINRA, SEC, SOC 2, and other regulatory requirements - and proactively identify gaps before they become findings
Own networking architecture decisions including VPC topology, Transit Gateway strategy, VPN configurations, load balancer patterns (ALB/NLB), CDN optimization (Fastly/CloudFront), and DNS management
AI/ML Infrastructure & Developer Productivity
Define the strategy for evaluating, integrating, and governing AI-powered developer tools (e.g., Cursor AI, GitHub Copilot, CodeRabbit) across the engineering organization - including usage analytics, cost optimization, security review, and policy frameworks
Architect infrastructure for AI/ML workloads: GPU-enabled compute, SageMaker endpoints, Bedrock integration, vector databases, and data pipeline orchestration
Lead the adoption of AI-driven automation for infrastructure operations - intelligent alerting, anomaly detection, auto-remediation, and AIOps patterns - moving from exploration to production integration
Build and champion internal platforms and golden paths that leverage generative AI to improve developer experience, reduce operational toil, and accelerate delivery velocity
Technical Leadership & Collaboration
Act as a trusted technical advisor to the Director of Engineering and engineering leadership on infrastructure strategy, trade-offs, and investment priorities
Lead cross-functional initiatives spanning application engineering, data, security, and DevSecOps teams - driving alignment on complex multi-team infrastructure projects
Represent the CIN team in architecture review boards, incident response leadership, and engineering-wide planning sessions
Author and maintain comprehensive technical documentation, runbooks, ADRs, and operational procedures that serve as organizational knowledge assets
Mentor senior and mid-level engineers; conduct thorough design and code reviews; actively contribute to hiring and onboarding processes for the CIN team

Requirements

Demonstrated ability to lead disaster recovery planning, execute DR simulations, and design HA architecture patterns for mission-critical systems
Proven experience designing and operating observability platforms (Datadog, Prometheus/Grafana, CloudWatch, OpenSearch/ELK) at organizational scale
Excellent technical communication skills - ability to write clear ADRs, present to leadership, and translate infrastructure complexity for non-technical stakeholders
Contributions to open-source projects, conference talks, or published technical writing
Track record of owning and driving infrastructure initiatives end-to-end - from design and architecture through implementation, rollout, and operational excellence
AWS certifications at the Professional or Specialty level: Solutions Architect Professional, DevOps Engineer Professional, Security Specialty, or Machine Learning Specialty
Expert-level proficiency with Terraform, including module design, state management strategies, and establishing IaC standards for engineering teams
7+ years of infrastructure or platform engineering experience, including 3+ years operating at a senior or staff level
Expert-level understanding of cloud networking (VPC architecture, Transit Gateway, peering, DNS, load balancing) and security (IAM, KMS, WAF, GuardDuty, Secrets Manager)
Strong experience with database infrastructure and data layer architecture (Aurora PostgreSQL, RDS, ElastiCache/Redis, OpenSearch, DynamoDB)
Proficiency with policy-as-code frameworks (OPA/Rego, Sentinel, Kyverno) for infrastructure governance and compliance automation
Don't meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification
Experience authoring RFCs, ADRs, or technical strategy documents that influenced engineering-wide decisions
Strong Linux systems engineering skills and advanced scripting proficiency (Python, Bash, or Go)
Experience with event streaming platforms at scale (Amazon MSK / Apache Kafka) including cluster operations, partition strategy, and consumer group management
Experience in financial services, fintech, broker-dealer, or other heavily regulated industries with FINRA/SEC compliance requirements
Extensive production experience operating Kubernetes (EKS strongly preferred) at scale - cluster lifecycle management, multi-tenancy patterns, Helm governance, and GitOps workflows
Demonstrated experience with AI/ML infrastructure: provisioning GPU compute, SageMaker/Bedrock integration, vector databases, MLOps pipelines, or AIOps automation in production
Proven track record of mentoring engineers and elevating team capabilities through knowledge sharing, design reviews, and tooling improvements
Deep expertise in CI/CD platforms (GitHub Actions, ArgoCD, Jenkins) with experience designing deployment strategies for multi-service architectures
Experience defining and executing organizational rollout strategies for AI developer tools, including governance frameworks, usage analytics, and cost management
5+ years of hands-on experience in cloud infrastructure engineering, with deep, production-proven expertise in AWS
Hands-on experience with API gateway management and platform design (Kong, AWS API Gateway)
FinOps certification or demonstrated experience leading cloud cost optimization programs at scale
At Altruist we are dedicated to building a diverse, inclusive, and authentic workplace, so if you're excited about this role, but your past experience doesn't align perfectly with every qualification in the job description, we encourage you to apply anyways
You may be just the right candidate for this or other roles

Benefits & conditions

Grow your career: Altruist is a thoughtful, fast-growing company that values high achievers
Our success is your success: Be rewarded for doing great work - competitive compensation and equity for all team members
Take care of yourself: Health, dental, and vision coverage, available on your first day
You're welcome here: Altruist is proud to be an equal opportunity employer. We're committed to inclusivity
Work-life integration: We work hard and respect everyone's whole selves

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all