DevOps Engineer (Johns Hopkins Data Science & AI Institute)
Role details
Job location
Tech stack
Job description
Whiting School of Engineering's Johns Hopkins Data Science and AI Institute (DSAI) seeks a DevOps Engineer (Sr. Systems Engineer) to design, configure, and maintain tools and processes to facilitate the work of DSAI's research and software engineers. This engineer will collaborate daily with DSAI RSEs and engineers from the Institute for Data Intensive Engineering and Science (IDIES); which provides much of DSAI's local compute and storage), Whiting School of Engineering, JHU Central IT, JHU Research IT, and JHU's HPC.
The Sr. Systems Engineer will provide technical leadership, project management, and task execution for administration, programming, maintenance, performance, implementation, security and support of various departmental and enterprise-wide multiple platforms, including the installation and texting of new software, operating systems, related utilities/services, and hardware products as well as the integration of new products and/or software release upgrades into the current environment. The Sr. Systems Engineer will conduct systems performance evaluations, monitoring, patch management and security evaluations. The Sr. Systems Engineer will analyze user needs in various computer environments (including but not limited to mainframe, Windows, and mid-range) and make recommendations for products and services that meet those needs. The Sr. Systems Engineer will ensure that all systems environments are maintained in an efficient and cost-effective manner.
Specific Duties & Responsibilities
Systems Analysis/Design (Environment/Platform)
- Design highly complex business, clinical, education, or infrastructure solutions by meeting with customers to observe and understand current processes and the issues related to those processes. Provide written documentation and diagrams of findings to share with the client and other IT colleagues. Assist lower levels to effectively use the system's technical software.
- Design highly complex solutions that conform to institutional policies, standards, and guidelines, and infrastructure environment and to vendor and industry best practices to deliver a quality product.
- Select infrastructure applications that reside between end user applications and hardware operating systems by working with vendors, customers, and other sources (i.e., open source or Internet2 initiatives) to provide configurable tools to the customers.
- Develop new methods to improve service processes, performance, and functionality by examining system management tools and processes. Review new methods suggested by lower levels and approve the work.
- Research, recommend, and implement new technologies based on the value to the institution.
- Works with vendor processes and products to improve the quality and fit for the institution. Typically establishes product mastery and demonstrates initiative for improvements.
- Assign and lead technical systems analysis and design tasks for assigned environments and platforms.
Install & Configure
- Install and configure highly complex server hardware and operating systems by following technical documentation to provide a working product.
- Evaluate, implement, and manage appropriate highly complex software and hardware solutions by using best practices for the environment to ensure system integrity.
- Install and configure infrastructure applications by following product installation and configuration directions and industry best practices to deliver a solution to the customers.
- Ensure an effective schedule is developed of system backups and archive operations by providing leadership, oversight, and direction to technical team in best practices for the environment to ensure data/media recoverability.
- Lead and provide direction to technical team for all above tasks by reviewing work and adherence to institutional standards and guidelines to deliver projects on time and within budget to the customers
Maintain & Troubleshoot
- Provide highly complex server level administration (manage HW/SW, maintenance, upgrades and patches, account maintenance, backups and recoveries and assist users) by following documented procedures to ensure a stable environment.
- Monitor and tune the system by following documentation and procedures to achieve optimum performance levels.
- Develop highly complex scripts and solutions by using departmental standards to automate systems management.
- Perform highly complex system software upgrades including planning and scheduling, testing, and coordination by following documentation and departmental standards to provide a stable product for the environment.
- Audit and maintain user access and authorization by following access and authorization documentation to provide for system security.
- Generate and maintain highly complex periodic and ongoing system specific reports by using appropriate tools to assess system performance, integrity and capacity in order to deliver a stable environment to the users.
- Follow and maintain IT security awareness and best practices by understanding security principles as they pertain to environments supported in order to deliver secure solutions to customers.
- Utilize system management and monitoring tools and incident tracking systems by following documentation and standards to detect incidents, take corrective actions, and determine root cause.
- Monitor changes and resolve any incidents by responding to problems as they occur, by reviewing all processing and output of the newly implemented solution, and by proactively ensuring the solution works successfully to satisfy the customer requirements and to provide a smooth transition to the new solution.
- Lead and provide direction to technical team for all the above tasks by reviewing work and adherence to institutional standards and guidelines to deliver high quality maintenance and troubleshooting to the customers.
Project Collaboration & Lifecycle Participation
- Implement changes by adhering to the change management policies and procedures for any given project to communicate to all parties the nature, significance, and risk factors of the solution.
- Lead effort to develop RFPs by engaging project team members in the process in order to develop well defined requirements to potential vendors for proposed solutions.
- Evaluate vendor proposals by reviewing requirements for the product to select the most appropriate vendor.
- Lead vendors, consultants, and inside Enterprise groups in developing applications by meeting with the team on a regular basis to deliver quality products to customers.
- Lead scheduled project team meetings by attending all meetings to provide input to the project team.
- Author and maintain documentation by writing audience-appropriate materials to serve as technical and/or end user reference.
- Lead technical team in test planning, test scenario construction, and test sessions appropriate to the changes being implemented by following testing guidelines to ensure all delivered solutions work as expected and errors are handled in a meaningful way.
- Review test results and corrections to all changes by following institutional and departmental testing standards to ensure all delivered solutions work as expected and errors are handled in a meaningful way.
- Participate in Institutional and Departmental committees and initiatives.
- Lead and provide direction to technical team for all of the above tasks by reviewing work and adherence to institutional standards and guidelines to ensure collaboration and communication with team members and customers.
- Perform other related duties as requested.
In addition to the duties described above
Infrastructure Management
- Design, implement, and maintain on-premises and cloud-based infrastructure for DSAI researchers and projects.
- Manage and optimize resource allocation, ensuring efficient utilization of compute, storage, and network resources.
CI/CD and Automation
- Develop and implement CI/CD pipelines for software development and deployment.
Collaboration and Support
- Collaborate with IDIES on configuration and maintenance of local compute and storage
- Collaborate with researchers and data scientists to understand their infrastructure needs and provide technical guidance.
- Support the deployment and scaling of machine learning models on various platforms, including cloud-based services and on-premises clusters.
- Work closely with IT security teams to ensure the security and integrity of DSAI systems and data.
- Become an expert in using the various JHU compute and storage options that JHU makes available via its various IT organizations (IDIES, Central IT, Research IT, Whiting School of Engineering, JHU HPC, Azure, AWS) and act as an advisor, mentor, and liaison for RSEs seeking to use them.
Research Computing Support
- Assist researchers with utilizing high-performance computing (HPC) clusters and specialized hardware for computationally intensive tasks.
- Optimize research workflows and provide guidance on best practices for utilizing computational resources., Johns Hopkins University requires all faculty, staff, and students to receive the seasonal flu vaccine. Exceptions to the flu vaccine requirements may be provided to individuals for religious beliefs or medical reasons. Requests for an exception must be submitted to the JHU vaccination registry.
The following additional provisions may apply, depending upon campus. Your recruiter will advise accordingly. The pre-employment physical for positions in clinical areas, laboratories, working with research subjects, or involving community contact requires documentation of immune status against Rubella (German measles), Rubeola (Measles), Mumps, Varicella (chickenpox), Hepatitis B and documentation of having received the Tdap (Tetanus, diphtheria, pertussis) vaccination. This may include documentation of having two (2) MMR vaccines; two (2) Varicella vaccines; or antibody status to these diseases from laboratory testing. Blood tests for immunities to these diseases are ordinarily included in the pre-employment physical exam except for those employees who provide results of blood tests or immunization documentation from their own health care providers. Any vaccinations required for these diseases will be given at no cost in our Occupational Health office.
Requirements
Do you have experience in Windows?, Do you have a Bachelor's degree?, * Bachelor's Degree.
- Six years related experience.
- Additional education may substitute for required experience and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula.
Preferred Qualifications
-
Knowledge in the assigned IT environments.
-
Strong experience with Linux system administration and shell scripting.
-
Proficiency with cloud computing platforms (Azure and/or AWS) and their services.
-
Experience with CI/CD tools.
-
Experience with monitoring and logging tools.
-
Experience with containerization technologies (e.g. Docker) and orchestration tools (e.g. Kubernetes).
-
Experience with configuration management tools.
-
Strong understanding of networking concepts and protocols.
-
Excellent communication and collaboration skills.
-
Experience with HPC environments and job scheduling systems (e.g. Slurm).
-
Experience with CPU and memory profiling.
-
Experience with GitOps practices and tooling.
Technical Skills & Expected Level of Proficiency
- Automation - Authority
- Cloud Migration - Authority
- Director Services - Authority
- Operating Software - Authority
- Scripting - Authority
- Software Development Life Cycle - Authority
- Systems Architecture - Authority
- Systems Analysis - Authority
- Systems Configuration - Authority
- Systems Design - Authority
- Systems Development - Authority
- Systems Engineering - Authority
- Systems Integration - Authority
The core technical skills listed are most essential; additional technical skills may be required based on specific division or department needs., Please refer to the job description above to see which forms of equivalency are permitted for this position. If permitted, equivalencies will follow these guidelines: JHU Equivalency Formula: 30 undergraduate degree credits (semester hours) or 18 graduate degree credits may substitute for one year of experience. Additional related experience may substitute for required education on the same basis. For jobs where equivalency is permitted, up to two years of non-related college course work may be applied towards the total minimum education/experience required for the respective job.
Applicants Completing Studies Applicants who do not meet the posted requirements but are completing their final academic semester/quarter will be considered eligible for employment and may be asked to provide additional information confirming their academic completion date.