DevOps Engineer / SRE
Role details
Job location
Tech stack
Job description
You will join a small team responsible for the stability, performance, and security of our server infrastructure: bare metal, VMs, databases, queues, networking, and infrastructure security. Our philosophy is simple: the team owns its systems end-to-end, and every engineer should be able to diagnose and fix issues in their area of responsibility.
This role is for someone who enjoys working with real infrastructure (OS, hardware/virtualization, networks), not just cloud abstractions. What You'll Do
- Work primarily with on-premise infrastructure (bare metal and VMs): setup, maintenance, troubleshooting
- Drive clarity in ambiguous situations by defining requirements, assumptions, and next steps
- Own automation projects end-to-end (design * rollout * maintenance)
- Improve how we operate: harden and tune systems and also improve the way the team works in terms of operational hygiene
- Keep the platform stable, fast, and secure: servers, web servers, databases, queues
- Investigate production incidents across OS / networking / infrastructure layers, apply temporary mitigations, coordinate with developers and participate in post-mortems
- Participate in on-call rotations
- Use AI in all aspects of day-to-day work: researching, troubleshooting, developing, We understand it's impossible to be an expert in everything, but it's important to have solid hands-on experience in two or more of the areas below:
- ClickHouse, MongoDB: what each database is used for, monitoring, troubleshooting performance and slow queries, sharding
- Kafka: operating clusters at scale (topic moves, broker replacements, tuning)
- Redis: high-load tuning, replication, sharding, performance monitoring
- Elasticsearch: configuration, scaling, sharding/cluster management
- HAProxy / Nginx: load balancing, SSL/TLS, caching, reverse proxying, performance monitoring
- OS tuning: kernel/network stack/filesystem parameters for high-load systems
- Full Disk Encryption on LVM: We use Clevis + Tang in production
- Infrastructure Security: Teleport, HashiCorp Vault
Bonus points
Great if you've worked with any of the following:
- VictoriaMetrics and how it differs from the Prometheus stack
- Complex CI/CD pipelines. We use scripted Jenkins pipelines
- Bare-metal Kubernetes: provisioning, networking (MetalLB or alternatives), isolation from the internet, scaling across providers (like OVH, Hetzner) and integration with existing infrastructure
- Flux and GitOps
- Terraform
Why work with us
- A strong, collaborative product team that owns what it builds
- Clear product vision and access to real customer feedback from global nonprofit leaders
- Flat structure: no politics, just great work with great people
- Transparent company culture-we share how we're growing, where revenue comes from, and what's next
- Long-term focus: we offer equity options and value sustained, meaningful contribution
Benefits
- Private medical insurance for the employee and their family
- 23 paid vacation days per year
- 11 paid public holidays per year
- 5 company-paid sick leave days
- English learning courses.
- Relevant professional education.
- Gym or swimming pool.
- Home Office Setup Assistance: the company offers assistance with purchasing furniture (office chair, office desk, monitor) and other items to create a comfortable workspace.
- Co-working.
- Remote working.
**Please note: All official correspondence from Fundraise Up will exclusively originate from the @fundraiseup.com domain. Exercise caution and ensure the authenticity of emails claiming to be from our company.
Requirements
- 4+ years as a DevOps Engineer / SRE (or very close responsibilities)
- Real, hands-on experience with servers (VMs, bare metal) at the OS level and below: configuring, troubleshooting, digging into "why it's broken"
- Confident Linux skills (we use Ubuntu). We expect you to be comfortable with the core tools from Linux Crisis Tools
- Solid understanding of networking basics; ability to configure and troubleshoot iptables
- Ansible + Git
- Experience with Bash or Python scripting for automation/observability
- Production/on-call experience: diagnosing incidents, restoring service, participating in post-mortems
- Ownership and attention to detail. Downtime is expensive: five years ago, 10 minutes of downtime cost us $100k - today it's even more