Site Reliability Engineer (DataCosmos)
Open Cosmos Ltd.
Harwell, United Kingdom
27 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Harwell, United Kingdom
Tech stack
Amazon Web Services (AWS)
Azure
Data Infrastructure
Linux
Distributed Systems
Monitoring of Systems
Reliability Engineering
Data Streaming
Data Logging
Kubernetes
Data Pipelines
Job description
- Owning the reliability, performance, and scalability of our data platform and processing pipelines
- Monitoring systems end-to-end, ensuring full visibility across infrastructure and data flows
- Responding to incidents, troubleshooting issues, and driving long-term fixes
- Improving deployments and contributing to CI/CD pipelines for safe, repeatable releases
- Working closely with engineering teams to design resilient, scalable systems
- Automating processes and reducing operational overhead
- Supporting customer-impacting issues alongside Customer Success teams
Requirements
- Strong demonstrable ability to work with Linux systems and cloud platforms (AWS, GCP or Azure)
- Solid Kubernetes knowledge and ability to run production systems
- A clear understanding of observability (monitoring, logging, tracing)
- Capable of designing or operating high-availability, distributed systems
- A mindset focused on automation, scalability, and continuous improvement
- Confidence working in fast-moving environments where reliability really matters
About the company
At Open Cosmos we are solving the world's biggest challenges from space, providing businesses, governments and researchers access to more readily available information than ever before - ready for the challenge? Then read on…
Working in our Data Division
At Open Cosmos, our Data division transforms satellite data into meaningful insights that drive real-world impact. The team delivers all data products generated by Open Cosmos and its partners, curates and develops DataCosmos (our geospatial data platform) and builds integrations that make satellite imagery easy to access and act on.
We're now looking for a Site Reliability Engineer to help us ensure our data platform is reliable, scalable, and performing at its best as we grow., When applying, please submit your CV in English.
Why Open Cosmos?
* Work at the cutting edge of space technology with customers around the globe.
* A mission-driven company making space accessible to help solve real-world challenges.
* A diverse, ambitious, and supportive team.