Remote engineer
Role details
Job location
Tech stack
Job description
We are seeking a hands-on AI Platform Engineer to design, build, and operate Databricks-based data and AI platforms on AWS. You will enhance AI capabilities by leveraging Databricks (Workspace, Unity Catalog, Lakehouse, MLflow), modern cloud services, and DevOps/MLOps practices to deliver reliable, secure, and scalable platforms.
Administer and optimize Databricks workspaces: cluster policies, pools, job clusters vs. all-purpose clusters, autoscaling, spot/fleet usage, and GPU/accelerated compute where applicable. Implement CI/CD for notebooks, libraries, DLT pipelines, and ML assets; automate testing, quality gates, and promotion across workspaces using GitHub Actions and Databricks APIs. Develop and optimize Delta Lake pipelines (batch and streaming) using Auto Loader, Structured Streaming, and DLT; enforce data quality and SLAs with expectations and alerts.
Requirements
Design and implement scalable Databricks platform solutions to support analytics, ML, and GenAI workflows across environments (dev/test/prod). Hands-on Databricks administration on AWS, including Unity Catalog governance and enterprise integrations. Strong AWS foundation: networking (VPC, subnets, SGs), IAM roles and policies, KMS, S3, CloudWatch; EKS familiarity is a plus but not required for this Databricks-focused role. Proficiency with Terraform (including databricks provider), GitHub, and GitHub Actions. Experience with MLflow for tracking and model registry; experience with model serving endpoints preferred. Familiarity with Delta Lake, Auto Loader, Structured Streaming, and DLT. Experience implementing DevOps automation and runbooks; comfort with REST APIs and Databricks CLI., Proven hands-on experience with Databricks on AWS: workspace administration, cluster and pool management, job orchestration (Jobs/Workflows), repos, secrets, and integrations. Strong experience with Databricks Unity Catalog: metastore setup, catalogs/schemas, data lineage, access control (ACLs, grants), attribute-based access control, and data governance. Expertise in Infrastructure as Code for Databricks and AWS using Terraform (databricks and aws providers) and/or AWS CloudFormation; experience with Databricks asset bundles or CLI is a plus. Experience with DevOps practices to enable automation strategies and reduce manual operations. Experience or awareness of MLOps practices; building pipelines to accelerate and automate machine learning will be viewed favorably. Proficient in cloud operations on AWS, with strong understanding of scaling infrastructure and optimizing cost/performance. xcskxlj Hay opciones de teletrabajo/trabajo desde casa disponibles para este puesto.