architect, Design engineer and Automation SME

Sunrise Systems, Inc
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Microsoft Windows
IBM AIX
Systems Engineering
Bash
Databases
Data Integration
Linux
Distributed Systems
Django
Graphical User Interface
Monitoring of Systems
Networking Hardware
Python
Korn Shell
Linux Servers
Microsoft SQL Server
Windows Server
MongoDB
MySQL
Oracle Applications
Parsing
Powershell
Red Hat Enterprise Linux - RHEL
Ansible
Shell Script
Backup and Restore
Network Switches
Flask
Infrastructure Automation Frameworks
Splunk
Dynatrace
VMware

Job description

Looking for architect, Design engineer and Automation SME for the role below. This is a fully remote position.

Automation Engineers and Architects Monitoring & Alerting Focused

Key Activities:

Unified Monitoring Framework: Build an integrated solution leveraging existing tools (Splunk, Dynatrace, security platforms) and local logs from Windows/Linux servers, infrastructure components (storage arrays, SAN switches, network devices), databases (Oracle, SQL Server, MySQL, MongoDB), backup systems (Rubrik, Data Domain, Infinibox), compute nodes (Dell servers), VMware environments, IBM Power/AIX, and IBM LinuxOne.

Automated Alerting & Proactive Response: Develop intelligent alerting mechanisms and automated remediation workflows to reduce manual intervention and accelerate incident resolution.

Data Integration & Gap Closure: Aggregate and normalize data from multiple sources, including platform tools and local logs, to fill visibility gaps and provide actionable insights.

Dashboard Development: Create a common GUI-based dashboard for real-time monitoring, alerting, and reporting across all infrastructure layers.

Skills & Tools: Utilize Ansible, Python, PowerShell, shell scripting, and GUI development to deliver scalable automation solutions.

Business Impact:

  • Improved Reliability: Proactive detection and automated remediation reduce outages and service degradation.
  • Operational Efficiency: Significant reduction in manual monitoring and troubleshooting efforts.
  • Enhanced Security & Compliance: Centralized visibility into logs and alerts ensures faster response to security events.
  • Scalability: A common framework supports growth and complexity without proportional increases in headcount.

Requirements

  • Proficiency in Python, Ansible, PowerShell, and shell scripting (Bash/Korn).
  • Ability to develop automation workflows for monitoring, alerting, and remediation.

Monitoring & Logging Tools:

  • Hands-on experience with Splunk, Dynatrace, and other enterprise monitoring platforms.
  • Familiarity with log aggregation and parsing from multiple sources (OS, applications, infrastructure components).

Infrastructure Knowledge:

  • Strong understanding of Linux (RHEL) and Windows Server environments.
  • Exposure to VMware, IBM Power/AIX, and IBM LinuxOne systems.
  • Knowledge of storage arrays, SAN switches, network switches, and IP traffic monitoring.
  • Experience with backup platforms (Rubrik, Data Domain, Infinibox).
  • Familiarity with database systems (Oracle, SQL Server, MySQL, MongoDB).

GUI Development:

  • Ability to build dashboard interfaces for real-time monitoring and alerting (using frameworks like Flask/Django for Python or similar).

Additional Skills:

Data Integration:

  • Ability to aggregate and normalize data from multiple sources for unified alerting.

Security & Compliance Awareness:

  • Understanding of security logs and compliance requirements for infrastructure monitoring.

Problem-Solving & Creativity:

  • Ability to identify gaps in current monitoring and design innovative solutions.

Experience:

  • 5+ years in infrastructure automation or systems engineering roles.
  • Proven track record in building automation frameworks and monitoring solutions.
  • Experience working in large-scale, distributed environments with global teams.
  • Prior involvement in proactive alerting and automated remediation projects is highly desirable.

Apply for this position