Senior AI Data Engineer
Role details
Job location
Tech stack
Job description
We are seeking a highly skilled and motivated Sr. AI Data Engineer with a proven track record in building scalable data platforms and incorporating Generative AI into data engineering workflows. The ideal candidate will have deep expertise in Databricks capabilities-including Delta Lake and Unity Catalog-to power AI and machine learning initiatives. You will play a pivotal role in setting up and operationalizing MLOps directly within Databricks, while seamlessly integrating a variety of open-source tools to enhance data quality, workflow automation, and metadata generation.
What You'll Be Doing:
- Databricks AI Solutions: Design, build, and maintain scalable data pipelines and workflows using Databricks to directly support AI/ML and analytics workloads. Leverage core capabilities like Delta Lake, Delta Live Tables, and Databricks Workflows to create high-performance data platforms.
- MLOps Operationalization: Set up, establish, and operationalize MLOps practices directly within the Databricks environment, including version control, CI/CD for data pipelines, automated testing, and model deployment strategies.
- Open-Source Integration: Utilize and integrate open-source tools such as Python, PySpark, and Apache Airflow for distributed data processing and workflow orchestration.
- GenAI-Enhanced Workflows: Implement GenAI-enhanced workflows using LLMs to automate metadata generation, create data dictionaries, validate data quality, and track data lineage.
- Architecture & Governance: Leverage medallion architecture (Bronze, Silver, Gold layers) following data lakehouse best practices. Integrate Unity Catalog for enterprise data governance and access control. Implement and operationalize best practices.
- Data Preparation: Collaborate with AI/ML teams to curate, prepare, and serve high-quality datasets for model training and inference.
Requirements
- BA or BS degree in Computer Science, Computer Engineering, Data Science, or a related field (Master's degree is a plus).
- Open-Source Proficiency: 5+ years of strong proficiency in open-source languages and frameworks, specifically Python and PySpark, for distributed data processing. Strong knowledge of open-source data orchestration tools like Apache Airflow.
- AI/Data Engineering: 5+ years of proven experience building large-scale data platforms, with at least 2+ years incorporating Generative AI into data engineering workflows.
- Databricks Expertise: 3+ years of hands-on experience with the Databricks platform, specifically leveraging data engineering and AI features (Delta Lake, DLT, Workflows, Unity Catalog).
- MLOps: Proven experience setting up, maintaining, and operationalizing MLOps frameworks within Databricks.
- Cloud & Architecture: 3+ years of experience with AWS data services (e.g., S3, Glue, Lambda) and a deep understanding of data lakehouse architecture.
- Certifications in Databricks Data Engineer Associate/Professional or AWS Data Analytics., * Experience with open-source streaming data processing tools like Apache Kafka or Structured Streaming.
- Familiarity with open-source data quality and analytics engineering tools such as dbt (data build tool), Great Expectations, or Sweetviz.
- Experience with open-source containerization (Docker) and orchestration (Kubernetes) for data applications.
- Understanding of vector databases and embedding pipelines for AI/ML applications., Applicants must be authorized to work in the U.S. We may consider candidates currently in H-1B status who are eligible for transfer.
Benefits & conditions
The proposed salary range for this role is $165,000 to $180,000 USD. The salary range provided is a good faith estimate representative of all experience levels. Karsun considers several factors when extending an offer, including but not limited to, the role, function and associated responsibilities, a candidate's work experience, location, education/training, and key skills.