ML Infrastructure Architect
Role details
Job location
Tech stack
Job description
Key Responsibilities Model Development: Design, train, and optimize ML models using frameworks like PyTorch or TensorFlow . GenAI Implementation: Lead the integration of LLMs, including fine-tuning, prompt engineering, and building RAG (Retrieval-Augmented Generation) pipelines. Infrastructure & Orchestration: Architect and maintain end-to-end ML pipelines (CI/CD for ML) using Docker , Kubernetes , and tools like MLflow or Kubeflow . Cloud Deployment: Deploy and manage production workloads on cloud platforms ( AWS/Google Cloud Platform/Azure ) with a focus on cost-efficiency and low latency. Monitoring & Governance: Implement robust monitoring for model drift, data quality, and performance metrics to ensure 24/7 reliability. Collaboration: Work closely with Data Scientists to productize research and with DevOps to align with enterprise security and infrastructure standards.
Requirements
Do you have experience in Pandas?, Do you have a Bachelor's degree?, Skill Matrix to be filled by Candidates: Mandatory Skills Years of Experience Year Last Used Rating Out of 10 End-to-End MLOps Automation GenAI Orchestration LLMOps Advanced Model Optimization & Inference, Technical Requirements Experience: 4+ years of hands-on experience in ML Engineering or MLOps roles. Core Stack: Expert-level proficiency in Python and standard ML libraries (Scikit-learn, Pandas, NumPy). Deep Learning: Strong experience with Transformers , CNNs, or RNNs. DevOps for ML: Mastery of containerization (Docker) and orchestration (K8s). Experience with Infrastructure as Code (Terraform/CloudFormation) is a major plus. GenAI Tools: Familiarity with LangChain, LlamaIndex, or Vector Databases (Pinecone, Milvus, Weaviate). Education: B.S./M.S. in Computer Science, Mathematics, or a related quantitative field.