Python Developer

Engage Partners Inc.
Union City, United States of America
9 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Union City, United States of America

Tech stack

Artificial Intelligence
Systems Engineering
Azure
Cloud Computing
Cloud Engineering
Data Infrastructure
DevOps
Identity and Access Management
Python
Key Management
Cloud Services
Software Engineering
Management of Software Versions
Workflow Management Systems
AI Infrastructure
Data Logging
Cloud Platform System
System Availability
Snowflake
Grafana
Multi-Agent Systems
Software Troubleshooting
Multi-Cloud
Containerization
AI Platforms
Git Flow
Data Lineage
Data Management
Machine Learning Operations
Terraform
Data Pipelines
Docker
Key Vault

Job description

We are seeking an AI Infrastructure Engineer (Python) to support, scale, and enhance a production AI and data platform. This role sits at the intersection of AI infrastructure, cloud engineering, and agent-based systems.

You will be responsible for ensuring the reliability, scalability, and performance of AI-driven systems operating in production environments across multi-cloud platforms (Azure and GCP). This is not a modeling or research role it's focused on building and maintaining the infrastructure that powers AI systems at scale.

This is an excellent opportunity for someone with strong foundational engineering skills who is eager to deepen their expertise in AI platforms and cloud-native systems.

What You'll Do

Systems Engineering & Agent Operations

  • Develop, maintain, and optimize production-grade Python code supporting data pipelines, agent workflows, and platform tooling
  • Own the full lifecycle of Python services (containerization, deployment, versioning, runtime management)
  • Manage environment configurations, secrets injection, and dependency management across containerized services
  • Build internal Python tooling and shared libraries to accelerate development workflows
  • Troubleshoot production issues end-to-end across application and infrastructure layers

AI Platform & Scaling

  • Operate and scale AI-driven agent systems in production environments
  • Ensure high availability, performance, and resilience under load
  • Support integrations between AI agents and data platforms
  • Build observability tools (logging, monitoring, tracing, alerting)
  • Implement auto-scaling strategies for containerized workloads
  • Contribute to evaluation frameworks and quality standards for AI systems

Infrastructure & Cloud Operations

  • Develop and manage infrastructure using Terraform across Azure and GCP
  • Manage cloud services including container registries, identity systems, secrets management, and networking
  • Deploy and maintain workflow orchestration tools (e.g., Prefect)
  • Maintain CI/CD pipelines and release workflows
  • Document systems, workflows, and data lineage with clear runbooks

Requirements

Required

  • 3-5 years of experience in Software Engineering, DevOps, or MLOps
  • Strong Python skills with experience building production systems
  • Experience with Docker and containerized applications in cloud environments (Azure and/or GCP)
  • Hands-on experience with Terraform
  • Experience with secrets management tools and secure configuration practices
  • Familiarity with CI/CD pipelines and Git-based workflows
  • Strong troubleshooting and systems-thinking mindset
  • Interest in AI systems and infrastructure

Preferred

  • Experience with Azure services (Container Apps, ACR, Key Vault, Managed Identities, VNets)
  • Experience with GCP services (Cloud Run, GKE, Vertex AI, IAM, Secret Manager)
  • Familiarity with workflow orchestration tools (e.g., Prefect)
  • Exposure to AI/agent frameworks (e.g., LangChain, MCP)
  • Experience with observability tools (e.g., MLflow, Langfuse)
  • Experience with data tools such as dbt or Snowflake
  • Familiarity with multi-cloud environments

Apply for this position