AI DevOps Engineer

Smartadvocate LLC
Melville, United States of America
16 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior
Compensation
$ 160K

Job location

Melville, United States of America

Tech stack

Microsoft Windows
API
Artificial Intelligence
Amazon Web Services (AWS)
Build Automation
Azure
Backup Devices
Continuous Integration
DevOps
Disaster Recovery
DNS
Monitoring of Systems
IIS
Virtual Private Networks (VPN)
Python
Linux System Administration
Microsoft SQL Server
Windows Server
Networking Basics
Performance Tuning
Powershell
Release Management
Prometheus
Azure DevOps Pipelines
Software Deployment
Data Streaming
TCP/IP
Backup and Restore
Zabbix
Scripting (Bash/Python/Go/Ruby)
Transport Layer Security
Load Balancing
System Availability
Large Language Models
Grafana
Multi-Agent Systems
Firewalls (Computer Science)
GIT
Containerization
Git Flow
HuggingFace
Patch Management
Virtual Agents
Oracle Cloud Infrastructure
Software Version Control
Docker
Jenkins

Job description

We are looking for a hands-on AI DevOps Engineer to own and build out the operational backbone of our legal case management platform. You will be the go-to person for infrastructure across traditional systems and modern AI workloads, including LLMs, RAG pipelines, vector databases, and agent-based systems.

We are looking for a hands-on Infrastructure & Operations Engineer to own and build out the operational backbone of our legal case management platform. You will be the go-to person for everything infrastructure - from development environments to production deployments across on-premises and cloud-hosted client sites.

This is a high-impact, high-autonomy role. You will be the primary Ops resource, working alongside developers who currently handle infrastructure part-time. Your mission is to bring structure, reliability, and observability to our operations - establishing proper CI/CD pipelines, monitoring, alerting, and incident response processes., * Design, build, and maintain CI/CD pipelines using Azure DevOps and Jenkins

  • Manage build configurations, artifact publishing, and release orchestration
  • Coordinate deployments across multiple client environments (on-prem and cloud)
  • Maintain and improve source control workflows using Git

Infrastructure Management

  • Provision, configure, and maintain Windows Server environments (dev, test, staging, production)
  • Administer IIS web servers - application pools, bindings, SSL certificates, performance tuning
  • Manage SQL Server instances - installation, configuration, backups, high availability (Always On)
  • Maintain networking fundamentals - DNS, firewalls, load balancers, VPN connectivity
  • Handle patch management and security hardening across all environments

Monitoring, Observability & AI Systems

  • Stand up and maintain monitoring infrastructure using Zabbix, Grafana, and Loki
  • Define and implement alerting rules for system health, performance, and availability
  • Build dashboards that give the team real-time visibility into all environments
  • Establish baseline metrics and SLAs for system performance

Incident Response & Troubleshooting

  • Serve as the primary point of contact for production infrastructure issues
  • Diagnose and resolve system outages, performance degradation, and deployment failures
  • Conduct root cause analysis and implement preventive measures
  • Document runbooks and operational procedures for common issues

Security & Compliance

  • Implement and maintain access controls, following the principle of least privilege
  • Manage SSL/TLS certificates across all environments
  • Ensure backup and disaster recovery procedures are in place and regularly tested
  • Support security audits and maintain awareness of data protection requirements (legal industry handles sensitive PII)

Requirements

Do you have experience in Windows Server administration?, * 5+ years of Windows Server administration - this is a Windows shop and you must be an expert

  • Expert-level Microsoft SQL Server - installation, configuration, backup/restore, performance tuning, Always On availability groups, index maintenance
  • Expert-level IIS administration - application pools, URL rewrite, SSL bindings, troubleshooting, performance optimization
  • CI/CD pipeline experience - Azure DevOps Pipelines and/or Jenkins, build automation, release management
  • Scripting with PowerShell - automation of routine tasks, deployment scripts, system administration
  • Source control - Git workflows, branching strategies, merge management
  • Monitoring tools - hands-on experience with at least one observability stack (Zabbix, Grafana, Prometheus, or similar)
  • Networking fundamentals - DNS, TCP/IP, firewalls, load balancers, VPN, SSL/TLS
  • Backup & disaster recovery - designing and testing backup strategies, point-in-time recovery
  • LLM Integration: OpenAI Chat Completions, Assistants API, Realtime API, function calling, streaming. Just knowing the Chat API is not sufficient
  • RAG Systems: Vector databases (Chroma or equivalent), embedding models (HuggingFace/OpenAI), chunking strategies, retrieval pipelines
  • Agentic Patterns: Tool-calling agents, multi-step reasoning, agent orchestration frameworks (LangChain or equivalent), * Microsoft Certification (MCSA, MCSE, or Azure equivalent) - strongly preferred
  • Oracle Cloud Infrastructure (OCI) experience - compute, networking, storage, block volumes
  • Grafana + Loki experience for log aggregation and visualization
  • Zabbix experience for infrastructure monitoring
  • Python scripting for automation and tooling
  • Docker / containerization basics
  • Linux administration fundamentals
  • AWS EC2 experience
  • Familiarity with compliance frameworks (SOC 2 or similar)
  • Experience supporting multi-tenant or client-deployed software products
  • What Makes You a Great Fit
  • Ownership mentality - you will be building this function, not slotting into an existing team. You see gaps and fill them without being asked.
  • Calm under pressure - production issues happen. You diagnose methodically, communicate clearly, and fix things fast.
  • Automation-first mindset - if you do something twice, you script it. Manual processes are temporary, automation is the goal.
  • Clear communicator - you can explain infrastructure issues to developers and stakeholders in plain language.
  • Documentation habit - you write things down so the team doesn't depend solely on your memory.
  • Pragmatic problem solver - you find the right solution for the situation, not the theoretically perfect one., * Microsoft SQL Server: 3 years (Preferred)
  • CI/CD: 4 years (Preferred)
  • PowerShell: 3 years (Preferred)
  • Disaster recovery: 3 years (Preferred)
  • Python: 4 years (Preferred)
  • Microsoft Windows Server: 5 years (Preferred)
  • AI: 3 years (Preferred)
  • LLM: 3 years (Preferred)
  • Agentic AI: 1 year (Preferred)

Benefits & conditions

Pulled from the full job description

  • 401(k)
  • Health insurance
  • Paid time off
  • Vision insurance
  • Dental insurance, * 401(k)
  • Dental insurance
  • Health insurance
  • Paid time off
  • Vision insurance

About the company

SmartAdvocate® is a leading legal case management application. Conceptualized by the founding partner of a prominent personal injury law firm, SmartAdvocate® is offered as either a cloud-based or self-hosted application. Legal professionals use SmartAdvocate to manage pre-litigation and litigation cases including contacts, communications, case data, documents, document scanning and filing, document creation, calendaring, and more. Our clients run in hybrid environments - on-premises Windows infrastructure and cloud-hosted deployments. The stack is Microsoft-centric: ASP.NET, SQL Server, IIS, and Windows Server, with Oracle Cloud Infrastructure (OCI) as our primary cloud platform.

Apply for this position