Member of Technical Staff - Principal Data Infrastructure Engineer
Role details
Job location
Tech stack
Job description
- Architect and maintain scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications.
- Champion DevOps and SRE best practices-automated deployments, service monitoring, and incident response.
- Build a self-service big data platform that empowers data and platform engineers and researchers.
- Develop robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code tools (Bicep, Terraform, ARM).
- Collaborate with Data Engineers, Data Scientists, AI Researchers, and Developers to deliver secure, seamless big data workflows.
- Lead technical design reviews and uphold a clean, secure, and well-documented codebase.
- Proactively identify and resolve bottlenecks in data pipelines and infrastructure.
- Optimize system performance across storage, compute, and analytics layers.
- Partner with Security teams to enhance system security (IAM, OAuth, Kerberos).
- Embody and promote Microsoft's values: Respect, Integrity, Accountability, and Inclusion.
Requirements
- Deep technical expertise
- A passion for automation and observability
- Fluency in distributed systems
- Creativity to design scalable solutions
- And just as importantly: empathy, collaboration, and a growth mindset, * Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering
- OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
- OR equivalent experience., * 4+ years in Big Data Infrastructure, DevOps, SRE, or Platform Engineering.
- 3+ years of hands-on experience managing and scaling distributed systems-from bare-metal to cloud-native environments.
- 2+ years deploying containerized applications using Kubernetes and Helm/Kustomize.
- Solid scripting and automation skills using Python, Bash, or PowerShell.
- Proven success in CI/CD pipeline management, release automation, and production troubleshooting.
- Experience working with Databricks for scalable data processing and analytics.
- Familiarity with security practices in infrastructure environments, including IAM, OAuth, and Kerberos administration.
- Proven experience with cloud-native infrastructure across Azure, AWS, or GCP.
- Hands-on expertise with modern data platforms like Databricks, including:
- Deep understanding of data storage and processing technologies:
- Relational & NoSQL databases
- Key-value stores.
- Spark compute engines.
- Distributed file systems (e.g., HDFS, ADLS Gen2).
- Messaging systems (e.g., Event Hub, Kafka, RabbitMQ).
- Capacity planning and incident management for large-scale big data systems.
- Solid collaboration history with Data Engineers, Data Scientists, ML Engineers, Networking, and Security teams.
- Familiarity with modern web stacks: TypeScript, Node.js, React, and optionally PHP.
- Exposure to agentic workflows, deep learning, or AI frameworks.
- Practical experience integrating LLMs (e.g., GPT-based models) into daily workflows-automating documentation, code generation, reviews, and operational intelligence.
- Solid grasp of prompt engineering techniques to design, optimize, and evaluate interactions with LLMs.
- Demonstrated ability to troubleshoot and resolve complex performance and scalability issues across infrastructure layers.
- Excellent interpersonal and communication skills, with a solid passion for mentorship and continuous learning.
- Experience applying LLMs to DevOps workflows, enhancing incident response, and streamlining cross-functional collaboration is a solid advantage.
#MicrosoftAI
#mai-datainsights #mai-datainsights
About the company
Microsoft is a global technology company headquartered in Redmond, Washington. Our mission is to empower every person and every organization on the planet to achieve more. We develop, license, and support a wide range of software products, services, and devices that help individuals and businesses realize their full potential.
Our flagship products include the Microsoft 365 productivity cloud, Windows operating system, Azure cloud platform, and Dynamics 365 business applications. We are also a leader in areas such as artificial intelligence, cybersecurity, developer tools, and gaming through Xbox and Game Pass.
With operations in more than 190 countries and over 220,000 employees worldwide, Microsoft is committed to responsible innovation, inclusive economic growth, and sustainability. We work closely with governments, industries, and communities to ensure that technology serves the public good and helps address some of the world’s most pressing challenges.
As we celebrate our 50th anniversary in 2025, we continue to look forward—investing in AI, cloud, and quantum computing to shape the future of work, education, and society at large scale.