Infrastructure Platform Engineer
Role details
Job location
Tech stack
Job description
The job works independently to perform a variety of activities relating to software support and/or development. Analyzes, designs, develops, debugs, and modifies computer code for end user applications, beta general releases, and production support. Guides development and implementation of applications, web pages, and user-interfaces using a variety of software applications, techniques, and tools. Solves complex problems in administration, maintenance, integration, and troubleshooting of code and application ecosystem currently in production., * Reporting to the Director of techstaff, acts as a senior technical leader, proposes and defines architectural standards, mentors junior engineers, drives platform reliability, automation, security, and observability across on-prem and cloud environments.
- Lead initiatives to improve infrastructure reliability, security, and performance, such as CI/CD pipeline improvements, system monitoring, and automation of manual workflows.
- Implement infrastructure-as-code principles using systems such as Puppet, Terraform, and Gitlab actions.
- Implement and maintain robust system monitoring, alerting, and logging solutions to ensure platform health, performance, and visibility.
- Perform expert-level build, configuration, and lifecycle management of Linux/Unix operating systems across physical and virtual, containerized, and cloud environments.
- Develop automation tools and scripts using Python, shell scripting, and other scripting languages to streamline operations.
- Manage and optimize Linux-based systems in production environments.
- Support department applications by managing infrastructure, CI/CD pipelines, and performing minor programming tasks.
- Implement hardened security configurations, including system hardening, secure communications, and identity/access integration.
- Stay up to date on the latest trends and technologies in platform engineering, infrastructure automation, software deployment, and provide guidance on their implementation within projects.
- Document work clearly, contribute to team best practices, and help others adopt design patterns by sharing knowledge in Slack, Internal Wikis, and GitLab.
- Participate in an off-hours on-call rotation.
- Designs new systems, features, and tools. Solves complex problems and identifies opportunities for technical improvement and performance optimization. Reviews and tests code and systems to ensure appropriate standards are met.
- Acts as a technical consultant and resource for faculty research, teaching, and/or administrative projects.
- Performs other related work as needed.
Requirements
Minimum requirements include a college or university degree in related field.
Work Experience:
Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline., * Bachelor's or master's degree in computer science, engineering, or a related field., * 5+ years of experience in an infrastructure, DevOps, platform, or SRE role supporting production systems.
- 3+ years of experience supporting Linux servers in a production environment.
- 5+ years of experience working in production environments.
- Experience working with CI/CD systems, such as GitHub Actions, Github Pipelines, Jenkins, or similar tools.
- Experience with an infrastructure automation tool, such as Puppet, Chef, Ansible, or Terraform.
- Experience writing scripts and programs to automate common tasks and support IT operations.
- Experience supporting web servers in production environments, such as Apache and Nginx.
- Experience using and implementing, and supporting monitoring, alerting, logging, and observability tools and have contributed to system reliability efforts.
- Familiarity with supporting Postgres and MySQL servers, database management, and query syntax.
- Experience with Linux system administration and troubleshooting.
Technical Skills or Knowledge:
- Ability to apply infrastructure as code principles using tools such as Puppet, Terraform, Gitlab Actions
- Proficient in at least one scripting or programming language, such as Python, Bash, or Ruby.
- An understanding of SQL syntax, queries, and relational databases.
- Familiarity with version control systems such as Git.
- Familiarity with observability tools such as Prometheus, DataDog, Splunk, Grafana, or Loki.
- An understanding of core infrastructure protocols and services, such as TCP/IP, DNS, DHCP, etc.
- An understanding or a willingness to learn web application frameworks, including Ruby on Rails and Flask.
- An understanding of SDLC, DevOps, and Agile methodology and best practices.
- Strong understanding of networking, storage, security, and access control within Linux systems.
Preferred Competencies
- Willingness to take initiative to understand current and potential systems and applications, ask thoughtful questions, and raise concerns early.
- Effective written and verbal communication; ability to collaborate with cross-functional technical teams.
- Communicate clearly and work collaboratively within a team environment, contributing to planning, troubleshooting, and retrospectives.
- Commitment to operational excellence, service reliability, and continuous improvement.
- Excellent problem-solving skills and attention to detail.