Senior Infrastructure/ DevOps Engineer, Fintech
Role details
Job location
Tech stack
Job description
- Infrastructure as Code (IaC): Quickly implement and adapt infrastructure using Terraform, Pulumi, or other major IaC tools.
- Containers: Docker is critical. Deeply understand how to design, build, and optimize secure, multi-stage Dockerfiles.
- CI/CD: Design, build, and manage robust CI/CD pipelines to automate testing, building, and deployment across environments.
- Core cloud services (AWS or GCP): Provision and manage foundational services. Deep expertise in one major provider is required, transferable to the other.
- Container compute: Expertise in at least one major container platform: EKS, GKE, ECS, Fargate, or Cloud Run. (Kubernetes is highly valued, particularly EKS or GKE.)
- Networking: Know when to use load balancers, VPNs for secure connectivity, and private VPCs for isolation. Apply subnetting, routing, VPC peering, and NAT gateways to build secure systems.
- Storage: S3 (AWS) or Cloud Storage (GCP).
- Databases: RDS (AWS) or CloudSQL (GCP).
- Serverless: Deploy event-driven components using AWS Lambda, GCP Cloud Functions, or equivalents.
- CDNs and message queues.
- Security: Protect PII; apply encryption, secrets management, network firewalls, and web application firewalls (AWS WAF, GCP Cloud Armor) following security best practices.
- Automation and scripting: Write high-quality automation and tooling in Go, Python, Node.js, or Bash for client-specific operational challenges.
- Monitoring and operations: Ensure robust monitoring and high system uptime.
Nice to Have:
The following would be a bonus experience to have, though highlight any additional experience or skills you may have. We like working with people with varied backgrounds and experiences.
- Production AI/agent experience: Hands-on experience running LLM or agent systems in production, including how they fail differently from deterministic services: nondeterministic outputs that break conventional testing and alerting, runaway token and inference cost, and partial failures on multi-step chains.
- AI observability and cost control: Tracing multi-step agent runs, treating token cost, latency, and output quality as first-class metrics, and keeping inference spend in check with budgets, rate limiting, and caching (Langfuse, LangSmith, Arize, or similar).
- The infrastructure AI systems run on: Model gateways and provider routing with failover (LiteLLM, Bedrock, Vertex), durable execution for long-running multi-step workflows (Temporal, Step Functions, Inngest), eval and regression pipelines for prompt or model changes, and the retrieval, vector-store, and context plumbing these systems depend on (including MCP). Vector databases and GPU/TPU compute where relevant.
- Domain experience in fintech or crypto/web3 environments.
- Crypto/web3 infrastructure: running nodes (Ethereum, Solana, or others), indexing solutions (The Graph, custom indexers), or RPC infrastructure.
- Payment processing, ledger architecture, or financial transaction systems, and meeting compliance requirements in regulated environments.
- High-volume, mission-critical systems: real-time data flows, websocket feeds, payment rails, or distributed architectures handling millions of transactions.
- Certifications: AWS or GCP cloud certifications are a plus, not mandatory.
- Advanced monitoring (Prometheus, Datadog) or logging experience.
AI in our hiring process We use AI to help us review and shortlist applications based on job-related criteria. A human hiring manager always makes the call on who moves forward. As a company that builds with AI every day, we're all for candidates using it too - just be upfront about how it helped.
Why Work With Us
- Accelerate Your Growth: Work across multiple industries and cutting-edge projects with a team that ships fast.
- Maximum Impact: Shape the future of the tech industry by helping ambitious projects reach their potential.
- True Ownership: Everyone ships code and has real autonomy with minimal meetings.
- Continuous Learning: Be part of a team of senior engineers and designers who love learning and knowledge sharing.
Why This Might Not Be For You
- You Resist Modern Tools: We leverage AI and cutting-edge technology to build rapidly and focus our energy on solving the most complex, high-impact challenges for our clients.
- You Crave High Structure: We thrive in an autonomous, fast-moving environment, which means you need to be comfortable with some ambiguity.
- You Prefer Conventional Careers: Our consulting approach involves varied projects and direct client interaction - it's not for everyone.
- You Like Staying in One Lane: Our projects require adapting to different tech stacks and modern tooling quickly.
Requirements
Do you have experience in Security compliance frameworks implementation?, * Seniority: Minimum of 5 years dedicated experience in DevOps, Infrastructure, or SRE roles. Expert tooling: expert with Docker, Kubernetes (k8s), and Terraform/Pulumi.
- Cloud proficiency: Deep, proven expertise in either AWS or GCP infrastructure, with the ability to quickly grasp and transition to other cloud providers.
- Development skills: Strong ability to write clean, maintainable code for automation in Go, Python, or Node.js.
- Security focus: Demonstrable experience implementing and maintaining modern cloud security controls and meeting key compliance standards (SOC 2, PIPEDA, HIPAA, and/or GDPR).
- Independent, proactive, and cross-functional: Proven ability to quickly onboard, diagnose problems, and propose and implement solutions with minimal oversight. Experienced in a consultant or freelancer capacity, with the ability to understand and communicate effectively with both technical and non-technical stakeholders.
Benefits & conditions
Pulled from the full job description
- 401(k)
- Health insurance
- Vision insurance
- Dental insurance
- Unlimited paid time off, * Work / Life Balance: We believe in our team's ability to have it all; a great career, and time to unplug and live…you know...life.
- Employee Care: We provide full benefits (healthcare, dental, vision) for our employees (401k for our US employees)
- Unlimited PTO: Everyone needs a break. Take at least 15 days off a year, and more if you need-just be cool about it and keep the team in mind.
- Regular Team Retreat: Join us for a week of team bonding at amazing destinations. Recent trips include the Dominican Republic, Cancun, and Hawaii - plus ones welcome.