Infrastructure Engineer

The Judge Group
Crown Point, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote
Crown Point, United States of America

Tech stack

Proxmox
Microsoft Access
API
Artificial Intelligence
Audit Trail
User Authentication
VoIP
Ubuntu (Operating System)
CentOS
Command-Line Interface
Software as a Service
Configuration Management
Databases
Data Integrity
Data Security
Data Stores
Dynamic Host Configuration Protocol
Debian Linux
Linux
Disaster Recovery
DNS
VMware ESX Servers
Fault Tolerance
Icinga
IP Addressing
Virtual Private Networks (VPN)
Key Management
Knowledge-Based Systems
Network Security
Linux System Administration
MariaDB
MongoDB
MySQL
Nagios
Routing
Network administration
Query Optimization
Red Hat Enterprise Linux - RHEL
Search Technologies
Shell Script
Virtual Local Area Networks
Virtualization Technology
vSphere
Software Vulnerability Management
AI Infrastructure
Data Logging
System Availability
Large Language Models
Software Troubleshooting
Firewalls (Computer Science)
AI Platforms
Deployment Automation
Performance Monitor
Patch Management
Machine Learning Operations
Virtual Agents

Job description

This position focuses on owning and strengthening the infrastructure behind a mission?critical healthcare SaaS platform. You'll be responsible for reliability, security, and performance across systems that process sensitive healthcare data and support always?on clinical workflows.

You'll operate in a regulated environment where downtime isn't an option, data integrity is sacred, and disaster recovery must actually work - not just look good in a diagram. You'll also support emerging AI infrastructure, including systems used for private LLM access, model serving, AI agents, and enterprise knowledge retrieval, all within healthcare?grade security and compliance boundaries.

This role is hands?on, Linux?heavy, and built for engineers who like precise operations, strong controls, and clean execution.

Onsite requirements are very light. You can imagine being onsite for three days in a row and then being remote for a month straight. Depends on project workload but the role is primarily remote.

Candidates must live in Chicagoland or NW Indiana and this role is not willing to provide sponsorship now or in the future.

Why This Work Is Worth Doing

  • Healthcare Impact at SaaS Scale
  • Your work directly enables care coordination, clinical workflows, and real?time communication across healthcare organizations.
  • High?Availability by Design
  • Redundancy, fault tolerance, and DR aren't wishlist items - they're baked into how the platform runs.
  • True Ownership
  • You own production systems end?to?end: provisioning, monitoring, incident response, and continuous improvement.
  • Security Comes First
  • Compliance, auditability, and data protection are core operating principles, not box?checking exercises.

What You'll Own

Virtualization & Infrastructure

  • Provision, manage, and optimize virtualized environments using platforms like VMware vSphere/ESXi, Hyper?V, Proxmox, or equivalents.
  • Support high?availability workloads, full VM lifecycle management, snapshotting, cloning, and performance troubleshooting.
  • Provision infrastructure for modern workloads, including containerized services, AI inference systems, and high?performance compute where required.
  • Participate in capacity planning and scaling efforts to keep pace with SaaS growth.

Linux Systems Administration

  • Administer production Linux systems (Debian, Ubuntu, CentOS/RHEL, or similar).
  • Operate primarily via CLI with a focus on stability, security hardening, and disciplined patch management.
  • Troubleshoot OS?level issues impacting performance, reliability, and availability.

Command Line & Automation

  • Write solid shell scripts (bash/zsh) to automate operational tasks and eliminate manual risk.
  • Investigate system behavior using tools like systemctl, journalctl, top/htop, and tcpdump.
  • Automate provisioning, deployment, and lifecycle management for AI services and model endpoints using repeatable, auditable workflows.
  • Improve consistency through automation and clear operational documentation.

Backup, Disaster Recovery & Business Continuity

  • Implement and maintain backup strategies using tooling such as Synology Active Backup, ESXi CBT?based backups, snapshots, replication, and offsite storage.
  • Regularly validate backups and perform test restores - because untested backups don't count.
  • Support defined RPO/RTO targets and actively participate in DR testing and reviews.

Monitoring, Alerting & Incident Response

  • Configure and maintain monitoring and alerting systems such as Nagios, Icinga, or comparable platforms.
  • Build meaningful checks, alerts, and dashboards that surface real problems - not noise.
  • Participate in incident response, root cause analysis, and post?incident improvement cycles.

Networking & Security Fundamentals

  • Manage and troubleshoot IP addressing, routing, VLANs, DNS, DHCP, firewalls, and VPNs.
  • Support secure network segmentation and access controls appropriate for healthcare SaaS environments.
  • Design and maintain secure connectivity for AI services, including private model APIs, data stores, agent tools, and knowledge systems.
  • Support vulnerability remediation, security reviews, and audit readiness efforts.

VoIP & Clinical Communication Systems

  • Configure and maintain VoIP platforms using Asterisk and/or FreePBX.
  • Troubleshoot SIP, call routing, and reliability issues affecting clinical and operational users.

Database Systems Support

  • Install, configure, back up, and maintain MySQL, MariaDB, and/or MongoDB systems.
  • Manage users and permissions, monitor performance, and assist with query optimization.
  • Ensure database recoverability and integrity consistent with healthcare data requirements.

Requirements

  • Proven experience in systems or network administration within SaaS, healthcare, or other regulated environments.
  • Strong Linux administration skills and deep comfort working in production via the command line.
  • Solid understanding of virtualization, networking, monitoring, and backup/DR best practices.
  • Experience supporting systems that demand high availability, auditability, and data protection.
  • Clear documentation habits and the ability to work effectively across engineering, security, and operations.
  • Working knowledge of AI infrastructure concepts, including model serving, LLM?based services, vector databases, embeddings, and retrieval workflows.
  • Familiarity with AI agent architectures and service integration patterns such as MCP or similar model?to?tool connectivity approaches.
  • Experience operating API?driven services with strong controls around authentication, secrets management, logging, and sensitive data access.

Bonus Points

  • Hands?on experience in HIPAA?regulated or compliance?driven environments.
  • Familiarity with healthcare platforms like EHR, ePCR, telehealth, or care coordination systems.
  • Experience with automation, configuration management, or infrastructure?as?code practices.
  • Exposure to GPU servers, containerized AI workloads, or self?hosted model?serving platforms.
  • Familiarity with private LLM deployments, vector search, RAG architectures, or enterprise AI platforms.
  • Understanding of AI governance, prompt and data security, and operational controls in regulated environments.

Apply for this position