Senior Data & Cloud Operations Engineer
Role details
Job location
Tech stack
Job description
You will work closely with our AI, software, and product teams to ensure a performant, secure, and scalable ecosystem. You will also act as the primary technical escalation point for complex system behaviour, cloud issues, and production incidents. The ideal candidate is hands-on, analytical, and proactive, with a strong sense of accountability toward system uptime, data integrity, and customer satisfaction., Business Intelligence & Customer Delivery
- Deploy, maintain, and optimize BI dashboard solutions for customers.
- Monitor performance, troubleshoot issues, and ensure reliability of delivered analytics products.
- Support customer onboarding and provide technical context during implementations.
Cloud & Infrastructure Engineering
- Design, develop, and maintain the XBAT.ai cloud environment supporting internal AI workflows and customer-facing components.
- Architect and operate scalable storage and compute environments handling 100+ TB datasets.
- Build and manage CI/CD pipelines for software components developed across the organization.
- Improve observability across systems through metrics, dashboards, alerts, and automated diagnostics.
Advanced Troubleshooting & Operational Ownership
- Diagnose and resolve complex issues across data pipelines, cloud platforms, APIs, dashboards, update servers, and application components.
- Take full ownership of production incidents: analysis, mitigation, root-cause investigation, and long-term corrective actions.
- Collaborate directly with engineering teams to reproduce issues, recommend fixes, and validate deployments.
- Create and maintain runbooks, operational documentation, and technical knowledge resources.
Customer-Facing Technical Support
- Serve as the technical escalation point when system malfunctions, degraded performance, or downtime occurs.
- Communicate effectively with customers regarding issue status, impact, and resolution steps.
- Provide technical workshops or briefings to prospects and clients when needed.
Security, Policy & Compliance
- Own XBAT.ai's cybersecurity posture, including policy maintenance and compliance alignment.
- Implement, monitor, and continuously improve security best practices across infrastructure, data flow, and cloud components.
Requirements
Do you have experience in Technical support?, Do you have a Master's degree?, * 3-5+ years of experience as a Data Engineer, Cloud Engineer, Platform Engineer, or a similar technical role.
- Proven experience with cloud platforms (AWS, Azure, or GCP) and infrastructure automation.
- Strong command of data pipeline engineering, ETL/ELT workflows, and distributed data systems.
- Experience with CI/CD tools and modern DevOps practices.
- Strong engineering fundamentals in Python, SQL, IaC, containers, and API integration.
- Demonstrated ability to resolve complex technical issues and drive incident management end-to-end.
- Ability to work directly with customers in a technical capacity.
- Familiarity with cybersecurity principles and willingness to own security and compliance topics.
- Proactive problem-solving attitude with strong ownership of outcomes., * Experience with IoT ecosystems, device integration, or sensor networks.
- Affinity with hardware or operational production-line environments.
- Experience with large-scale storage systems (object stores, data lakes, distributed file systems).
- Background supporting ML/AI teams, GPU workloads, or high-performance compute environments.
Benefits & conditions
- A high-impact role with broad ownership across cloud, data, operations, and security.
- Direct collaboration with AI and software engineering experts in a rapidly scaling environment.
- Autonomy to shape our infrastructure, reliability practices, and engineering roadmap.
- Exposure to diverse engineering challenges spanning cloud systems, cybersecurity, hardware integration, and data-intensive workloads.
- Opportunities to interact directly with customers, influence technical strategy, and drive continuous improvement across the organization.