Site Reliability Engineer
Role details
Job location
Tech stack
Job description
We are seeking a dynamic and highly skilled Site Reliability Engineer (SRE) to join our innovative tech team. In this role, you will be at the forefront of maintaining and enhancing the reliability, scalability, and performance of our enterprise software systems and cloud infrastructure. Your expertise will drive the automation of deployment processes, optimize system performance, and ensure high availability across our distributed systems. If you thrive in a fast-paced environment where technical troubleshooting, system testing, and continuous improvement are key, this is your opportunity to make a significant impact!, * Develop, implement, and maintain automation tools using scripting languages such as Python, Bash, PowerShell, Groovy, Perl, and Ruby to streamline deployment processes and system configurations.
- Manage cloud infrastructure across platforms like AWS, Google Cloud Platform, Microsoft Azure, OpenStack, and VMware virtualization environments to ensure scalable and secure IT operations.
- Monitor system health using tools like New Relic, Splunk, Elasticsearch, and other log analysis solutions to proactively identify issues and optimize performance.
- Configure and manage container orchestration platforms including Kubernetes and Docker to support microservices architectures and high availability requirements.
- Collaborate with development teams on CI/CD pipelines utilizing Jenkins, GitHub/GitLab, Maven, Gradle, TFS, and other DevOps automation tools to facilitate rapid software deployment.
- Maintain enterprise software systems such as WebSphere, Weblogic, JBoss, Tomcat, Microsoft SQL Server, Oracle databases like DynamoDB or MySQL for reliable data management.
- Implement disaster recovery plans and incident response procedures to ensure business continuity during outages or security incidents.
- Conduct system testing, troubleshooting support for complex issues involving Linux/UNIX administration, network administration including DNS/TCP/IP/WAN/LAN configurations.
Requirements
- Proven experience with cloud computing platforms (AWS, Google Cloud Platform, Azure) and cloud security best practices.
- Strong knowledge of IT infrastructure management including virtualization (VMware), containerization (Docker), orchestration (Kubernetes), and configuration management tools such as Ansible, Puppet or Chef.
- Expertise in scripting languages like Python, Bash (Unix shell), PowerShell; familiarity with Groovy or Perl is a plus.
- Hands-on experience with enterprise software such as WebSphere Application Server, Weblogic Server; familiarity with SaaS environments is desirable.
- Solid understanding of distributed systems architecture including microservices design patterns and RESTful API integrations.
- Ability to troubleshoot complex technical issues related to Linux/Unix systems administration, database management (MySQL/SQL Server/PL/SQL), network protocols (TCP/IP), DNS or incident response protocols.
- Experience with continuous integration/delivery pipelines using Jenkins or GitLab CI/CD; familiarity with version control systems like Git or SVN is essential.
- Knowledge of monitoring tools such as New Relic or Splunk for log analysis; experience in implementing high availability solutions for enterprise applications.
Benefits & conditions
$45 - $50 an hour - Full-time, Contract