Senior AI Infrastructure Engineer

Staffed4U LLC
Jessup, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 306K

Job location

Jessup, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Software Applications
Application Performance Management
Computing Platforms
Cloud Computing
Cloud Engineering
Encodings
Information Systems
Computer Engineering
Data Infrastructure
DevOps
Distributed Systems
Python
Prometheus
Search Technologies
Service-Oriented Architecture
Software Deployment
Software Engineering
Systems Architecture
Systems Integration
Web Applications
Workflow Management Systems
AI Infrastructure
Data Logging
Enterprise Software Applications
Cloud Platform System
High Performance Computing
System Availability
Large Language Models
Grafana
Multi-Agent Systems
Software Security
Generative AI
AI Platforms
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Machine Learning Operations

Job description

We are seeking an experienced Senior AI Infrastructure Engineer to support the design, deployment, and operation of enterprise artificial intelligence and machine learning platforms. This role will be responsible for developing and maintaining scalable infrastructure that enables the delivery of AI-powered applications and services across the organization.

The successful candidate will independently design, implement, and operate cloud-native infrastructure components while supporting modern AI technologies, distributed systems, and production service environments. This position requires strong expertise in platform engineering, cloud technologies, automation, observability, and software development., * Design, implement, and optimize infrastructure supporting AI model deployment and inference at scale.

  • Develop, maintain, and support production AI services and applications.
  • Collaborate with stakeholders and engineering teams to define technical solutions for evolving business and operational requirements.
  • Design and implement scalable, reliable, and maintainable platform architectures.
  • Drive adoption of emerging technologies, engineering best practices, and automation solutions.
  • Implement monitoring, logging, alerting, and observability capabilities for platform services.
  • Automate infrastructure provisioning, configuration, and lifecycle management using Infrastructure-as-Code (IaC) methodologies.
  • Ensure high availability, reliability, performance, and scalability of platform services.
  • Support the secure deployment and operation of AI systems and associated data environments.
  • Contribute to system architecture reviews, platform modernization efforts, and operational support activities.
  • Provide technical guidance, knowledge sharing, and mentorship to engineering team members.
  • Participate in troubleshooting, root cause analysis, and continuous improvement initiatives.

Requirements

  • Bachelor's degree in Computer Science, Software Engineering, Information Systems, Computer Engineering, or a related technical discipline and eight (8) years of relevant experience; OR
  • Four (4) additional years of directly related experience may be substituted for the degree requirement.

Technical Qualifications

  • Demonstrated experience building, deploying, and maintaining enterprise-scale production systems.
  • Experience designing and supporting high-volume web applications and distributed service architectures.
  • Strong background in systems integration across diverse technologies, platforms, and cloud environments.
  • Hands-on experience designing, deploying, and operating cloud infrastructure in Amazon Web Services (AWS).
  • Experience administering and deploying applications using Kubernetes.
  • Strong software development skills using Python.
  • Experience implementing observability and monitoring solutions using technologies such as:
  • Application Performance Monitoring (APM) tools
  • OpenTelemetry
  • Grafana
  • Prometheus
  • Experience developing and maintaining Continuous Integration and Continuous Deployment (CI/CD) pipelines.
  • Knowledge of DevOps principles, automation practices, and modern software delivery methodologies.
  • Demonstrated ability to lead technical initiatives and influence engineering practices across teams.
  • Ability to operate effectively in dynamic environments with evolving requirements.
  • Excellent written and verbal communication skills., * Experience supporting AI model deployment, serving, and inference platforms.
  • Experience integrating generative AI and large language model (LLM) technologies into enterprise applications.
  • Experience with AI workflow orchestration frameworks, including LangChain or similar technologies.
  • Knowledge of vector databases, embedding technologies, and semantic search solutions.
  • Experience implementing Retrieval-Augmented Generation (RAG) architectures.
  • Experience with distributed computing, high-performance computing, or large-scale processing environments.
  • Familiarity with autonomous agent frameworks and emerging AI technologies.

Knowledge, Skills, and Abilities

  • Strong cloud engineering and platform architecture expertise.
  • Deep understanding of distributed systems and cloud-native application design.
  • Ability to balance reliability, security, scalability, and performance requirements.
  • Strong analytical and problem-solving skills.
  • Ability to lead technical initiatives and influence organizational technology adoption.
  • Strong collaboration and stakeholder engagement skills.
  • Excellent organizational skills and attention to detail.
  • Ability to mentor engineers and contribute to a culture of technical excellence.

Benefits & conditions

This position includes a competitive and flexible benefits package, including:

  • Medical Employer pays 100% of the monthly premium for the employee and 80% for the employee's dependents.

  • Health Savings Account (HSA) Save for all medical, dental, vision and prescription expenses by contributing pre-tax money to an HSA account. Employer contributes 50% of the annual deductible (prorated to start date).

  • Dental and Vision Employer pays 100% of the monthly premium for the employee and 80% for dependents.

  • Life Insurance 100% company-paid Life and Accidental Death & Dismemberment (AD&D) coverage offered to all full-time employees.

  • Short-Term Disability 100% company-paid short-term disability. This benefit pays out 60% of earnings, with a $1,500 maximum for up to 12 weeks.

  • Retirement Plan Automatic 6% of salary contributed to the company 401(k) plan, fully vested. Employee match encouraged but not required.

  • Paid Time Off (PTO) & Holidays 5-6 weeks of PTO based on tenure with the company, in addition to 11 paid holidays.

  • Tuition Reimbursement $5,000 annually for courses directly related to job role and responsibilities.

  • Training Reimbursement Paid training, certification courses, and conferences to support employee career growth.

Apply for this position