Sr Software Engineer, Infrastructure

Databricks
San Francisco, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

San Francisco, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Automation of Tests
Azure
Cloud Computing
Code Review
Continuous Integration
Distributed Systems
Github
Python
Prometheus
Software Engineering
Systems Architecture
Systems Integration
Datadog
Data Logging
Pulumi
Scripting (Bash/Python/Go/Ruby)
Delivery Pipeline
Containerization
Kubernetes
Information Technology
Kafka
Azure
Terraform
Data Pipelines
Docker

Job description

As a Senior Software Engineer (Infrastructure), you will be a core technical contributor on the IT Infrastructure team, owning and driving the evolution of our core infrastructure and observability platforms. This role requires a strong software engineering mindset, deep technical breadth across SRE and infrastructure worlds, and the ability to deliver high-quality, scalable solutions for currently "immature" system problems. You will be responsible for building resilient, scalable, and automated infrastructure that empowers our development teams. As a senior member of the team, you will bridge the gap between software engineering and systems architecture, ensuring our AWS environment is cost-optimized, secure, and highly available. The Impact You Will Have

  • Architect and Automate: Design and deploy production-grade infrastructure on AWS using Terraform or Pulumi.
  • Orchestration: Manage and scale containerized workloads using AKS (Azure Kubernetes Service) or EKS, focusing on cluster security and resource efficiency.
  • CI/CD Excellence: Architect robust deployment pipelines using GitHub Actions, managing both GitHub-hosted and self-hosted runners for specialized build requirements.
  • Drive "Observable by Default" Frameworks: Create underlying infrastructure to ensure new internal applications are secure and have logging and metrics enabled by default
  • Tooling, Scripting & AI : Build internal CLI tools,AI plugins and automation scripts to streamline developer workflows and enhance operational efficiency
  • Partner Cross-Functionally: Collaborate with stakeholders across Security, Engineering, Infrastructure, and Support to deliver impactful projects with real business outcomes.
  • Mentor and Document: Participate in Code reviews, Document solutions and failure triage playbooks, and mentor junior engineers on the platforms you own.

Requirements

  • Software Engineering Expertise: 5+ years of production-level experience with a strong proficiency in Python (non-negotiable).
  • IaC: Expert-level proficiency in Terraform (modules, state management) or Pulumi(Preferred).
  • Cloud & Infrastructure Breadth: Hands-on experience with AWS (or Azure/GCP), Kubernetes, Docker and containerization concepts.
  • Automation & Integration Mindset: Experience building and troubleshooting integrations between infrastructure, data pipelines, and observability platforms.
  • CI/CD: Advanced knowledge of Github Actions, Github Runners.
  • Strong Observability Mindset: Understanding of observability pillars (logging, metrics, tracing) and hands-on experience with tools like Datadog, Prometheus, or ELK.
  • Distributed Systems: Proficiency in running systems through concepts like Kafka or messaging queues.
  • Independent Execution: Ability to operate with minimal guidance, take ownership of ambiguous projects, and follow a vision set by tech leads to execute independently.

Apply for this position