Senior Site Reliability Engineer
Bakkt Llc
9 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
£ 65KJob location
Tech stack
Java
Multitier Architecture
Microsoft Windows
Tomcat
Google App Engines
JIRA
Databases
Data Visualization
Linux
DevOps
Programming Tools
Event Logging
Monitoring of Systems
Python
Microsoft SQL Server
MongoDB
Reliability Engineering
Logstash
Software Engineering
SQL Databases
Web Applications
Datadog
Data Logging
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
Spring-boot
GIT
Angular
Kubernetes
Information Technology
Software Coding
Software Version Control
Job description
As a Site Reliability Engineer, you will be responsible for closely monitoring our production environments, swiftly addressing issues, and applying creative solutions to ensure the seamless operation of our platform. You will utilize your natural curiosity and strong problem-solving skills to investigate and resolve technical issues across our applications, services, databases & infrastructure., * Implement and manage robust monitoring systems to continuously track the functional and non-functional health and performance of our production systems.
- Proactively identify anomalies and potential issues before they impact our clients.
- Client Support:
- Partner with software engineering, project management and customer success teams to respond to client requests and support inquires.
- Work closely with our clients to provide support during integration, and ensure a positive experience.
- Incident Management:
- Lead escalation remediation's by working across multiple teams such as software engineering, devops, and project management for web applications and services running in a 24/7, always on, cloud platform environment.
- Participate in an on-call rotation to address and resolve critical incidents outside of regular business hours.
- Operations:
- Execute and develop operational procedures necessary for service requests and incident response.
- Maintain critical platform support knowledge, such as customer contact lists, vendor escalation procedures, scheduled job inventories, and operational playbooks.
- Support planning and execution of production changes and software releases.
- Automation:
- Develop scripts and tools to automate repetitive tasks, streamline workflows, and improve the efficiency of the production support process.
- Assist in the automation of customer operational tasks and ensures alignment with business requirements regarding customer facing processes such as customer order reconciliation.
- Ensure timely execution of scheduled and repeatable processes such as periodic system validations, daily triage, system monitoring and event log management.
- Continuous Improvement:
- Actively participate in process improvement initiatives, suggesting enhancements to observability, logging strategies, incident response procedures, and support workflows.
Requirements
- A bachelor's degree in Computer Science, Information Technology or equivalent
- 5+ years of application support and production support experience supporting cloud-based platforms using an SRE support model.
- Proven track record in a production support/SRE role, demonstrating your ability to monitor and troubleshoot complex systems in highly available production environments.
- Experience with common development tools and practices, including Java-based, Springboot environments and source control tools, such as GIT in a team environment
- Demonstrated ability to understand application logs and and supporting various monitoring and visualization tools (e.g. Alertsite, LogStash, DataDog)
- Excellent communication skills, both written and verbal, for effective interaction with technical and non-technical stakeholders.
- Self-starter who can work independently and effectively across functional team environments.
- Proven ability to learn new IT technologies and disciplines.
Preferred
- Ability to read and interpret Java, Angular, SQL and other software coding languages
- Experience with GCP, Google Kubernetes Engine, Google Compute Engine
- Experience with n-tier web and services application architectures and in Java-based, Springboot and Tomcat Environment.
- Working knowledge of SQL Server
- Experience with JIRA or other Service Desk tools
- Experience with multiple OS platforms (Linux, Windows)
- Experience with Mongo and scripting language like python