AI Data Engineer
Role details
Job location
Tech stack
Job description
-
Endtoend data engineering for AI & analytics
-
Design, build, and maintain scalable data pipelines and services that feed ML models, LLM/RAG solutions, and advanced analytics.
-
Serve as the data backbone for AI products from raw ingestion through curated, analytics and modelready datasets.
Data integration & data stitching
- Act as a master at finding, joining, and reconciling data from disparate systems (CRM, billing, interaction/call data, clickstream, thirdparty sources, etc.).
- Resolve data quality issues, gaps, and inconsistencies; establish reliable, reusable data assets for AI and analytics teams.
Advanced data architecture & modeling
- Design and implement advanced data architectures (e.g., lakehouse, dimensional models, domain data products) to support AI and analytics at scale.
- Build strategic data models and solutions tailored for analytics, ML, and AI use cases (feature stores, RAG retrieval layers, training/inference datasets).
- Define and maintain data contracts, schemas, and standards that ensure consistency, performance, and ease of use.
Automation, orchestration & observability
- Use workflow/orchestration tools (e.g., Airflow, Dagster, cloudnative orchestrators) to automate repetitive tasks and complex data flows.
- Implement robust monitoring, alerting, and observability for pipelines, ensuring reliability, data quality, and clear SLAs.
Cloud platforms, performance & cost optimization
- Build solutions on Databricks, AWS, and Redshift that are secure, performant, and costefficient.
- Size, estimate, and predict costs for data solutions at scale; continuously optimize cloud spend through smart architecture, rightsizing, and tuning.
AIaware data engineering & partnership
- Partner closely with AI Engineers and Data Scientists to understand model and LLM requirements and translate them into data designs, features, and pipelines.
- Implement data patterns tailored to ML/LLM workloads (feature stores, training/validation sets, inference pipelines, vector indexes for RAG).
Leveraging AI for engineering productivity
- Use AIassisted coding tools and other AI capabilities to improve development speed, code quality, documentation, and testing.
- Stay current on AI tooling relevant to data engineering and incorporate it into daytoday workflows.
Data stewardship & business partnership
- Become the expert on enterprise data and its usage understanding sources, lineage, meaning, and business relevance.
- Work closely with product, analytics, and business stakeholders to ensure data assets align with how the business operates, measures performance, and makes decisions., * Frequent Internal Hackathons: Engage in dynamic competitions with exciting prizes to keep your skills sharp.
- Cultural Celebrations: Strengthen our familial bonds through shared celebrations, fostering a sense of community.
- Diverse Project Exposure: Work on a variety of projects across sectors like Healthcare, Banking, e-commerce, and Retail, collaborating with leading global brands.
- Centre of Excellence (COE): Benefit from technical guidance and upskilling opportunities provided by our team of technology experts, helping you navigate your career path.
- E-Learning Platform: Gain access to comprehensive e-learning platforms coupled with a robust mentorship program to enhance your skills.
- Open Door Policy: Embrace a culture of mutual support, respect, and open dialogue, promoting a collaborative work environment.
If you are passionate and excited about working in a fast-paced, innovative environment, we would love to hear from you!
Requirements
- Bachelor s degree in Computer Science, Data Engineering, Information Systems, or a closely related technical field.
- Advanced degree is a plus.
Experience
- 8+ years of data engineering experience in largescale, production environments.
- 8+ years experience data modeling and building strategic data solutions for analytics and ML, and 3+ years providing data structures specifically for AI solutions (including LLM/RAG use cases).
- 8+ years experience finding, joining, and reconciling data from disparate systems (CRM, billing, interaction/call data, operational systems, thirdparty sources, etc.).
- 8+ years experience with advanced data architectures (e.g., lakehouse, dimensional models, domain data products) supporting AI and analytics at scale.
- 8+ years experience defining data contracts, schemas, and standards that ensure consistency, performance, and ease of use across teams and platforms.
- 5+ years experience with automation & orchestration, using workflow/orchestration tools (e.g., Airflow, Dagster, cloudnative orchestrators) to automate repetitive tasks and complex data flows, including robust monitoring, alerting, and observability with SLAs.
Technical skills
-
Programming & data processing
-
Expertlevel SQL for complex transformations, data reconciliation, and performance tuning.
-
Strong Python skills for ETL/ELT, data pipelines, and integration with ML and AI workflows.
-
Experience with Spark (preferably on Databricks) for largescale data processing.
Cloud & platforms
- Deep, handson experience with Databricks, AWS (e.g., S3, Glue, EMR/compute, Lambda, IAM), and Amazon Redshift.
Data architecture & governance
-
Strong understanding of advanced data architecture design (lakehouse patterns, dimensional modeling, data vault, domainoriented data products).
-
Solid grasp of data governance, data quality, lineage, and metadata practices.
-
Experience in large, complex enterprises (Fortune 100 or similar), especially with highvolume transactional, interaction, or customer data.
-
AI & ML awareness
-
Working knowledge of AI/ML and LLM/RAG data requirements and coding patterns (e.g., feature stores, training/validation splits, vector stores, retrieval indexes).
-
Experience collaborating closely with AI/ML teams and integrating with their pipelines and APIs.
Cost & performance
- Demonstrated ability to size, estimate, and optimize compute, storage, and data processing costs in cloud environments.
- Experience tuning queries, jobs, and architectures for both performance and cost efficiency.
Mindset & collaboration
- Strong ownership mentality; accountable for the reliability, quality, and fitness of the data you provide.
- Excellent collaboration skills; proven success partnering with AI Engineers, Data Scientists, and Analytics teams to deliver robust, Fortune 100grade solutions.
- Clear communicator who can explain data structures, constraints, and tradeoffs to both technical and nontechnical stakeholders.
Preferred Qualifications
- Experience with CI/CD, infrastructureascode, and DataOps/MLOps practices.
Familiarity with common analytics and BI tools (e.g., Tableau, Power BI, Looker) and how they consume data