Cloud Operations Engineer
Role details
Job location
Tech stack
Job description
The Cloud Operations Engineer will be part of a remote team that works on commercial, state, and federal projects. This candidate will work closely with existing DevOps and CloudOps teams and be part of daily Scrum sessions. Our CloudOps Team is responsible for site reliability, monitoring, automation and resolving systems alerts and customer issues. They lead in educating and implementing solutions that meet or exceed customer's needs. To us, this means that teams own their automations and monitoring. We focus our efforts on building scalable and reliable infrastructure that keeps our platform running smoothly. Some travel may be required.
Lendistry: Who We Are
We're proud to be the nation's largest minority-led, tech-savvy lender for small businesses and commercial real estate. As a certified Community Development Financial Institution (CDFI) and Community Development Entity (CDE), our mission is all about creating economic opportunities and fueling growth for small business owners and their communities. Join us as we pave the way with innovative financing and financial education!
What You'll Be Doing
- Monitor and resolve application and customer issues in production.
- Support the automation of recurring issues and issues that need manual intervention.
- Identify and implement process improvement to reduce time to resolve support tickets.
- Create dashboards and solutions to pro-actively identify issues.
- Reduce human errors, increase quality and security through automation.
- Collaborate with excellent verbal and written communication skills.
- Troubleshoot alerts and escalated issues.
- Engage in and improve services from deployment, operation, through refinement.
- Maintain production environments by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through automation.
- Evolve systems by pushing for changes that improve reliability.
- Practice sustainable incident response and disaster recovery exercises.
- Communicate in real-time using Slack and MS Teams.
- Follow infrastructure as code best practices.
- Participate in on-call rotation that will troubleshoot production impacting issues.
- Create and improve documentation and runbooks.
- Participate in blameless postmortems.
- Perform other duties assigned to support the efficient, effective operation of the department, and that help to make Lendistry the best place to work!
Requirements
Do you have experience in UNIX?, Do you have a Bachelor's degree?, * Bachelor's Degree in Computer Science or related technical field, or equivalent experience.
- 2+ years professional experience in Cloud Operations and application monitoring.
- AWS and Terraform Certifications are a plus.
- High sense of urgency and drive to resolve issues quickly.
- Expertise in analyzing and troubleshooting containerized workloads and applications.
- Script first mentality for automation.
- Ability to debug, optimize code, and automate routine tasks.
- Solid python, shell, Java and JavaScript knowledge.
- Systematic and creative problem-solving approach, with effective communication.
- Proven track record of supporting multi-az, multi-region, N-tier architecture applications in a public cloud-based infrastructure.
- Understanding of Unix/Linux operating systems.
- Understanding of application golden signal.
- Understanding of dashboarding using techniques like USE and RED.
- Managing cloud-based infrastructure on AWS (preferred), Azure or GCP.
- Advanced knowledge of Infrastructure as code tools and best practices.
- Code repository best practices; Git, GitHub, "Git Flow" or other workflows.
- IaaS Administration (SDKs and cli - AWS preferred).
- Building, optimizing, hardening, and troubleshooting of new services, tasks, and technology from POC to production.
- Application performance monitoring (APM).
- Experience using PostgreSQL and/or MySQL.
- Experience with Continuous Integration tools like GitHub Actions.
- Knowledge of web and application server management (Nginx, Tomcat, NodeJS).
- Experience with Terraform, Ansible or Cloud Formation.
- Experience with AWS technologies such as EC2, ECS, S3, RDS, and CloudWatch.
- Ability to run Docker containers on AWS ECS., This is a stationary position that requires frequent sitting (approximately 95%), repetitive wrist motions, grasping, speaking, listening, close vision, and the ability to adjust focus. It also may require occasional standing, lifting, carrying of 20lbs or less, walking, kneeling, bending/stooping, twisting, pulling/pushing, and reaching above the shoulder. Employees in this position must be physically able to efficiently perform the essential functions of the position.
Benefits & conditions
The US base salary range for this full-time position is $96,700 - $122,100 annually.
Our salary ranges are determined by role, level, and location.
The range displayed on each job posting reflects the minimum and maximum base salary for new hires for the position across all US locations. Within the range, individual pay is determined by multiple factors like job-related skills, experience, and state of residence. Your recruiter can share more about the specific salary range during the interview process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include any variable compensation elements.