Linux Administrator
Role details
Job location
Tech stack
Job description
- Linux Infrastructure Administration: Manage servers, services, access, updates, diagnostics, network settings, system issues, and incidents.
- Develop Infrastructure as Code (IaC): Define infrastructure through code, maintain host/role configurations, and manage changes via Git/Gerrit, code reviews, and CI.
- Migrate from SaltStack to Ansible: Participate in a large-scale, planned transition. This is not about "rewriting YAML for the sake of YAML," but an opportunity to co-design a modern IaC platform (inventory, roles, secrets, CI validation, rollout processes, and operational rules).
- Support Build/Test Farms: Maintain a large fleet of Windows, Linux, and macOS nodes (server-class configurations, GPU nodes, and specialized test environments). While the Build Team handles build logic, the Infrastructure Team ensures the fleet is reproducibly provisioned, available, monitored, and managed via IaC.
- Physical Infrastructure Management: Administer bare-metal servers, network equipment, disk arrays/shelves, shared storage for virtualization, IPMI/iDRAC, and data center environments.
- Storage Management: Work with iSCSI, multipath, LVM, and shared block storage. Handle latency diagnostics, clone/backup/restore integrity, and scheduled maintenance.
- Virtualization Support: Manage Proxmox, Linux VMs, network/storage dependencies, templates, provisioning, and operational procedures.
- CI/CD Infrastructure: Support Jenkins/Gerrit, agents, system dependencies, machine availability, monitoring, logs, and basic automation.
- Infrastructure Services: Support Docker, Nginx, DNS, VPN, monitoring, logging, and backup/restore for internal development teams.
- AWS Management: Operate a subset of servers in AWS, managing access, networking, security groups, and diagnostics.
- Internal Tooling Improvement: Maintain and improve engineering code in Gerrit (IaC, monitoring, backup tooling, access management, inventory, Jenkins integrations, and automation scripts).
- AI-Assisted Engineering: Use AI tools in daily work for code analysis, automation drafting, review, anomaly detection, and accelerating routine tasks. We value automation and support those who turn manual work into code.
- Architectural Improvements: Propose ways to make the infrastructure simpler and more observable. Reduce manual operations and transform recurring issues into automated checks or robust technical solutions.
- Transparent Task Management: Track statuses, raise blockers early, and discuss solutions openly with the team.
Requirements
-
5+ years of Linux system administration experience.
-
Deep Linux knowledge: systemd, users/groups, permissions, storage, network stack, firewall, package management, logs, and performance troubleshooting.
-
IaC/Configuration Management experience: SaltStack, Ansible, Puppet, or similar.
-
Hardware experience: Operating bare metal, RAID/HBA, IPMI/iDRAC/iLO, and diagnosing hardware issues.
-
Senior-level Storage understanding: Block devices, multipath, iSCSI, LVM, filesystems, and their impact on virtualization/services.
-
Practical Networking: TCP/IP, VLAN, routing, NAT, firewalls, VPN, DNS, and diagnostics (tcpdump, ss, ip, dig, traceroute).
-
Docker: Practical experience with container runtimes in an operational context.
-
Scripting: Proficiency in Python or Shell for routine task automation.
-
Git & Code Review: Experience with standard development workflows.
-
Observability: Understanding of monitoring, alerting, logging, and incident response.
-
Technical Leadership: Ability to propose architectural solutions, justify them, and see them through to production.
-
Interest in AI-assisted engineering: Readiness to use modern AI tools to optimize the engineering process.
-
English: Technical proficiency (reading documentation and technical correspondence).
A plus would be:
-
Experience designing Ansible infrastructure (collections, secrets management, CI validation).
-
Experience with Proxmox, oVirt, or VMware.
-
Experience with Juniper/Dell or other networking hardware.
-
In-depth knowledge of Jenkins/Gerrit (managing nodes, credentials, and system-level integrations).
-
Experience supporting large-scale CI agent clusters.
-
AWS experience (EC2, VPC, IAM, troubleshooting).
-
Familiarity with Icinga, Grafana, Telegraf, Wazuh, or Bacula.
-
Experience managing Windows/macOS within a Linux-centric infrastructure.
-
Experience with CM migrations (specifically SaltStack to Ansible)., We are seeking a Senior Engineer who is passionate about building infrastructure, not just maintaining it. We want someone who finds genuine satisfaction in simplifying complex systems: reducing manual tasks, writing cleaner code, improving monitoring, accelerating diagnostics, ensuring smoother rollouts, and building more resilient recovery processes.