Member of Technical Staff - Principal Data Infrastructure Engineer

Microsoft
Redmond, United States of America
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 275K

Job location

Redmond, United States of America

Tech stack

PHP
Artificial Intelligence
Amazon Web Services (AWS)
Data analysis
Application Release Automation
Azure
Bash
Big Data
Cloud Computing
Code Generation
Computer Engineering
Continuous Integration
Information Engineering
Data Infrastructure
DevOps
Distributed File Systems
Distributed Systems
Hadoop Distributed File System
Monitoring of Systems
Identity and Access Management
Python
Kerberos (Protocol)
Enterprise Messaging Systems
Node.js
NoSQL
OAuth
Performance Tuning
Powershell
RabbitMQ
Reliability Engineering
Azure
Software Engineering
Systems Integration
TypeScript
Data Processing
Data Storage Technologies
Cloud Platform System
React
Delivery Pipeline
Large Language Models
Prompt Engineering
Spark
Deep Learning
Containerization
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Deployment Automation
Bare Metal
Bicep
Kafka
Terraform
GPT
Data Pipelines
Databricks

Job description

  • Architect and maintain scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications.
  • Champion DevOps and SRE best practices-automated deployments, service monitoring, and incident response.
  • Build a self-service big data platform that empowers data and platform engineers and researchers.
  • Develop robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code tools (Bicep, Terraform, ARM).
  • Collaborate with Data Engineers, Data Scientists, AI Researchers, and Developers to deliver secure, seamless big data workflows.
  • Lead technical design reviews and uphold a clean, secure, and well-documented codebase.
  • Proactively identify and resolve bottlenecks in data pipelines and infrastructure.
  • Optimize system performance across storage, compute, and analytics layers.
  • Partner with Security teams to enhance system security (IAM, OAuth, Kerberos).
  • Embody and promote Microsoft's values: Respect, Integrity, Accountability, and Inclusion.

Requirements

  • Deep technical expertise
  • A passion for automation and observability
  • Fluency in distributed systems
  • Creativity to design scalable solutions
  • And just as importantly: empathy, collaboration, and a growth mindset, * Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
  • OR equivalent experience., * 4+ years in Big Data Infrastructure, DevOps, SRE, or Platform Engineering.
  • 3+ years of hands-on experience managing and scaling distributed systems-from bare-metal to cloud-native environments.
  • 2+ years deploying containerized applications using Kubernetes and Helm/Kustomize.
  • Solid scripting and automation skills using Python, Bash, or PowerShell.
  • Proven success in CI/CD pipeline management, release automation, and production troubleshooting.
  • Experience working with Databricks for scalable data processing and analytics.
  • Familiarity with security practices in infrastructure environments, including IAM, OAuth, and Kerberos administration.
  • Proven experience with cloud-native infrastructure across Azure, AWS, or GCP.
  • Hands-on expertise with modern data platforms like Databricks, including:
  • Deep understanding of data storage and processing technologies:
  • Relational & NoSQL databases
  • Key-value stores.
  • Spark compute engines.
  • Distributed file systems (e.g., HDFS, ADLS Gen2).
  • Messaging systems (e.g., Event Hub, Kafka, RabbitMQ).
  • Capacity planning and incident management for large-scale big data systems.
  • Solid collaboration history with Data Engineers, Data Scientists, ML Engineers, Networking, and Security teams.
  • Familiarity with modern web stacks: TypeScript, Node.js, React, and optionally PHP.
  • Exposure to agentic workflows, deep learning, or AI frameworks.
  • Practical experience integrating LLMs (e.g., GPT-based models) into daily workflows-automating documentation, code generation, reviews, and operational intelligence.
  • Solid grasp of prompt engineering techniques to design, optimize, and evaluate interactions with LLMs.
  • Demonstrated ability to troubleshoot and resolve complex performance and scalability issues across infrastructure layers.
  • Excellent interpersonal and communication skills, with a solid passion for mentorship and continuous learning.
  • Experience applying LLMs to DevOps workflows, enhancing incident response, and streamlining cross-functional collaboration is a solid advantage.

#MicrosoftAI

#mai-datainsights #mai-datainsights

About the company

Microsoft is a global technology company headquartered in Redmond, Washington. Our mission is to empower every person and every organization on the planet to achieve more. We develop, license, and support a wide range of software products, services, and devices that help individuals and businesses realize their full potential.

Our flagship products include the Microsoft 365 productivity cloud, Windows operating system, Azure cloud platform, and Dynamics 365 business applications. We are also a leader in areas such as artificial intelligence, cybersecurity, developer tools, and gaming through Xbox and Game Pass.

With operations in more than 190 countries and over 220,000 employees worldwide, Microsoft is committed to responsible innovation, inclusive economic growth, and sustainability. We work closely with governments, industries, and communities to ensure that technology serves the public good and helps address some of the world’s most pressing challenges.

As we celebrate our 50th anniversary in 2025, we continue to look forward—investing in AI, cloud, and quantum computing to shape the future of work, education, and society at large scale.

Apply for this position