Lead Data Architect
Role details
Job location
Tech stack
Job description
Senior/Lead technical data architect to design, build, and operate enterprise data platforms that power GenAI and AI/ML use cases. This is a highly technical, hands-on role responsible for data platform architecture, endtoend data engineering, ML/LLM pipeline design, production model onboarding, and delivery of scalable Databricks- centric solutions across cloud environments. Candidate must be AWS Certified Machine Learning - Specialty.
What You'll Be Doing:
- Architect and implement enterprise data platforms (batch + streaming) optimized for ML, LLMs, and GenAI workloads.
- Lead design and hands on implementation of Databricks workspaces, Unity Catalog, Delta Lake design patterns, cluster policies, and performance tuning.
- Build and own end to end data pipelines (ingest, transform, feature engineering, serving) using PySpark, Databricks Jobs, Spark SQL, Delta Lake, and orchestration tools.
- Design and operationalize model training, fine tuning (LLM), evaluation, deployment, and monitoring pipelines (MLOps/RAG/CAG) integrating Databricks MLflow, CI/CD, and infra-as-code.
- Implement vectorless and vectorization/embedding pipelines, vector store integrations, and retrieval layers for RAG (FAISS, Pinecone, Weaviate, Milvus).
- Define data schemas, governance, lineage, access controls, and data product APIs; implement Unity Catalog or equivalent for centralized governance.
- Drive cost/performance optimization for storage, compute (spot/preemptible),and query patterns.
- Collaborate with engineers, data scientists, product owners, and security to translate business needs into production GenAI solutions.
- Mentor and lead engineering teams; conduct architecture reviews, code reviews, and run technical deep dives.
- Implement observability for data and ML pipelines (metrics, logging, data quality tests, alerting).
- Create reproducible experiment tracking, model registry, and rollout strategies (canary, shadow testing, rollback).
- Stay current on GenAI/LLM architectures and evaluate/introduce new tooling and frameworks.
Requirements
-
BA or BS degree in CS, Computer Engineering, Information Technology or a related field.
-
8+ years hands on experience in data engineering/platform architecture; 3+ years in an architect or lead role.
-
Proven, hands on Databricks experience (designing workspaces, Delta Lake, performance tuning, productionizing Spark jobs).
-
Deep Spark + PySpark expertise and experience with Databricks Runtime.
-
Strong experience building ML/LLM pipelines and operationalizing models (training, fine tuning, serving).
-
Practical experience with vector embeddings, semantic search, and RAG architectures.
-
Solid Python expertise and common ML libraries (PyTorch, TensorFlow, Hugging Face transformers) and MLflow.
-
Cloud platform experience (AWS strongly preferred).
-
Experience with containerization and orchestration while leveraging open source libraries for unstructured and structured data processing, serving/inference.
-
Strong SQL skills; experience with distributed query/warehouse systems and parquet/AVRO/Delta formats.
-
CI/CD and infra-as-code experience (Terraform, GitOps, Jenkins/GitHub Actions/GitLab CI).
-
Data governance, security, and IAM experience; experience implementing row/column level access controls and data lineage.
-
Demonstrated ability to design for scalability, reliability, and cost efficiency.
Preferred Qualifications:
- Prior experience with Databricks Unity Catalog, Photon, and Databricks SQL.
- Experience integrating Databricks with vector databases (Pinecone, neo4j) and retrieval frameworks (LangChain, LlamaIndex).
- Familiarity with AWS Bedrock or other managed LLM services.
- Experience with realtime streaming (Kafka, Kinesis) and stream processing on Databricks Structured Streaming.
- Certifications: Databricks Certified Professional.
- Experience with data quality and profiling tools (Great Expectations, Soda).
- Experience with large-scale ETL frameworks and tools (Airflow, Prefect)., Applicants must be authorized to work in the U.S. We may consider candidates currently in H-1B status who are eligible for transfer.
Benefits & conditions
The proposed salary range for this role is $160,000 to $190,000 USD. The salary range provided is a good faith estimate representative of all experience levels. Karsun considers several factors when extending an offer, including but not limited to, the role, function and associated responsibilities, a candidate's work experience, location, education/training, and key skills.