L2 NOC Engineer
Role details
Job location
Tech stack
Job description
- Manage and maintain the Client's Monitoring Systems for on-premises/Cloud entities like Network infrastructure (routers, switches, firewalls). Servers (physical and virtual), Storage systems, Applications and databases, Cloud resources, Security systems and Backup systems
- Monitor Dynatrace, Max Gauge, Grafana, ELK Stack, and Log Management System
- Incident detection, logging, classification, and prioritization
- Incident response and resolution according to defined SLA
- Proven experience facilitating outage bridges or worked in Major Incident Management.
- Regular reporting on monitoring activities and incident metrics
- Escalate incidents as needed to client POC.
- Conduct root cause analysis for major incidents.
- Recommend preventative measures
- Configure and maintain log collection agents
- Develop and refine log parsing rules and alert thresholds
- Create and maintain error code detection rules
- Maintain on-call manager rotation * Manage incident bridge infrastructure.
- Document all incidents according to procedures
- Correlation of logs across multiple systems and applications.
- Maintenance of WIKI and technical documentation (for NOC) of processes and procedures used throughout normal operations.
- Development of knowledge and skills in network and system administration, particularly about Client's architecture and platforms.
- Participate in a 24x7 call-out rotation including Weekend support.
Continuous Service Improvement:
- Regular review of incident patterns and trends
- Identification of recurring issues and root causes
- Recommendations for preventative measures
- Quarterly service improvement meetings
- Ongoing optimization of Dynatrace, Max Gauge, Grafana, ELK Stack, and Log Management System configurations
- Refinement of threshold values and error code detection rules
- Log analysis pattern improvements
- Incident bridge process refinement
- Escalation procedure effectiveness review
Requirements
- Experience monitoring infrastructure using various monitoring tools. Proven experience facilitating outage bridges or worked in Major Incident Management.
- A minimum of 5 to 7 years of experience as an L2 Monitoring/Major Incident Management or similar role.
- Good network diagnostic skills.
- Basic Linux CLI and Basic sysadmin skills.
- Preferred working knowledge on tools like Dynatrace, Max Gauge, Grafana, ELK Stack, and Log Management Systems.
- Experienced in running outage bridges for closure or worked in Major Incident Management. * Willing to work rotational shifts including night shifts.
- Ability to assess and prioritise faults and respond or escalate accordingly.
- Experienced implementing service improvement techniques and procedures.
- Good communicator with a natural aptitude for dealing with issues to resolution.
Benefits & conditions
Pay Range*: $85,000 - $99,500 Per Year
*Pay range offered to a successful candidate will be based on several factors, including the candidate's education, work experience, work location, specific job duties, certifications, etc.
Benefits: Innova Solutions offers benefits( based on eligibility) that include the following: Medical & pharmacy coverage, Dental/vision insurance, 401(k), Health saving account (HSA) and Flexible spending account (FSA), Life Insurance, Pet Insurance, Short term and Long term Disability, Accident & Critical illness coverage, Pre-paid legal & ID theft protection, Sick time, and other types of paid leaves (as required by law), Employee Assistance Program (EAP).