MLOps & AI Infrastructure Engineer

dlocal

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, Spanish, Portuguese

Job location

Tech stack

API

Artificial Intelligence

Big Data

Continuous Integration

Distributed Systems

Machine Learning

Azure

AI Infrastructure

Delivery Pipeline

Spark

Backend

Containerization

AI Platforms

Low Latency

Apache Flink

Kafka

Data Management

Machine Learning Operations

Video Streaming

Data Pipelines

Databricks

Job description

enterprise Feature Store, combining: into online stores. Databricks / Spark pipelines for offline feature computation, backfills and training datasets. Point-in-time correctness for offline training and backtesting. Low-latency, high-throughput online feature serving with clear SLAs, TTL semantics and multi-tenant safety. Help data scientists and domain teams onboard new features safely and consistently across Flink and Databricks. Offline-online parity checks, data quality, drift and freshness monitoring for critical feature groups. Unified feature retrieval APIs (online/offline/batch) and SDK/CLI usage from models and services. MLOps platform implementation (training, serving, observability) Implement and improve training and evaluation pipelines: Promotion flows from dev * staging * production, following platform standards. Work on online and batch inference paths: Model packaging and deployment. Integrate and extend agents and AI services (built by the AI Team and MLOps) to automate

Requirements

key parts of the Feature Store and MLOps workflows (health checks, drift and quality analysis, documentation/specs, incident triage, FinOps suggestions, etc.). Design these automations with clear guardrails: observable, auditable and easy to roll back, always keeping humans in control of production decisions. Access control, secrets management and PII handling in features and models. Data Science squads and the AI Team to understand requirements and unblock use cases. Contribute to internal documentation, RFCs, examples and onboarding guides so other engineers and data scientists can adopt the platform more easily. Solid experience as a Senior Engineer working on: MLOps, data platforms, or large-scale backend / distributed systems. Hands-on experience with big data / streaming technologies (e.g. Spark, Flink, Kafka, Kinesis, or similar). Proven track record building production-grade ML pipelines: Experiment tracking and reproducible training flows. CI/CD for models and data pipelines. Online and batch inference at scale. Familiarity with cloud-based ML platforms and containerized deployments (e.g. Data and model drift, freshness and quality checks. Comfortable communicating with Data Scientists, ML Engineers and Infra/SRE, translating requirements into concrete technical solutions. Log/metric/incident analysis or documentation generation. Flexibility: we have flexible schedules and we are driven by performance. Language classes: we provide free English, Spanish, or Portuguese classes. Social budget: you'll get a monthly budget to chill out with your team (in person or remotely) and deepen your connections Also, you can check out our webpage, Linkedin and Youtube for more about dLocal

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all