Systems Engineer, Managed Operations, Managed Operations

Amazon.com, Inc
Berlin, Germany
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Berlin, Germany

Tech stack

Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Software Applications
Systems Engineering
Cloud Computing
Cloud Engineering
Configuration Management
Databases
Continuous Integration
Linux
DevOps
Perl
Fault Tolerance
Python
Network Troubleshooting
Performance Tuning
Productivity Software
Reliability Engineering
Cloud Services
Ansible
Software Systems
Datadog
Scripting (Bash/Python/Go/Ruby)
Grafana
Prompt Engineering
Generative AI
Cloudformation
Bug Reporting
Deployment Automation
Machine Learning Operations
Functional Programming
Cloudwatch
Puppet

Job description

AWS has successfully launched the European Sovereign Cloud (ESC), marking a significant development in Utility Computing (UC). To spearhead this initiative, we are actively seeking experienced systems engineers with a strong background in automation and operations. As part of the AWS Managed Operations team, you will play a pivotal role in building, operating and evolving operations and development teams dedicated to delivering high-availability AWS services, including EC2, S3, Dynamo, Lambda, and Bedrock, exclusively for EU customers. For more information on ESC please check out our blog: https://aws.amazon.com/blogs/aws/in-the-works-aws-european-sovereign-cloud/

Your responsibilities will encompass overseeing the ongoing operations and expansion of the ESC, working closely with global AWS teams, and influencing the evolution of AWS services and technology. A typical day in this role involves collaborating with technology leaders, contributing to the enhancement of day-to-day operations, and ensuring continuous improvements in availability, reliability, latency, performance, and efficiency of the ESC. You will be required to occasionally participate in "on-call" rotations to resolve incidents occurring out-of-hours.

The overarching goal is to deliver scalable services and ensure a high-availability experience for EU customers. If you are an experienced professional ready for a challenging and impactful opportunity, we invite you to join our efforts in operating and scaling a best-in-class development engineering and operations team that aligns with AWS' commitment to customer satisfaction and continual innovation.

Utility Computing (UC) European Sovereign Cloud (ESC) is a part of AWS Utility Computing (UC). AWS Utility Computing (UC) provides product innovations - from foundational services such as Amazon's Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS's services and features apart in the industry. As a member of the UC organization, you'll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS. Within AWS UC, Managed Operations engineers engage with AWS customers who require specialized security solutions for their cloud services.

Eligibility requirement

Fluency in written and spoken English is required.

Candidate must be a national of an EU member state and residing in the EU to operate the AWS European Sovereign Cloud.

Amazon will provide relocation support for successful applicants relocating within the European Union.

Employees will participate in an On-Call rota.

A day in the life Embark on a week filled with meaningful contributions to the operation and improvement of significant software systems. You dedicate a substantial portion of your time to carefully review the operational health of services within your team's responsibility. In the process, you diligently identify anomalies and craft actionable bug reports, aspiring to enhance the overall efficiency and performance of your systems.

In addition to these responsibilities, you offer constructive feedback on change management documents and work earnestly to address your team's operational backlog. Through a collaborative effort, you strive to navigate challenges and ensure the seamless functionality of your systems. Additionally, you engage in the development and testing of scripts, hoping to provide practical solutions to enhance your workflows.

Beyond the technical aspects, you assume a role as an educator, sharing insights on the complexities of the European Sovereign Cloud with service teams. It's a humbling experience for you to contribute to the collective knowledge of the team, fostering a culture of mutual understanding.

This week encapsulates your commitment to continuous learning and improvement, acknowledging that every effort, no matter how small, contributes to the collective success of your team and the reliability of your software systems.

In addition to these responsibilities, your position involves 24x7 on-call responsibility. You work as a team to root-cause issues and ensure your systems remain resilient and fault-tolerant, underscoring your commitment to maintaining operational excellence.

Requirements

Experience in Linux OS and network troubleshooting, or experience in networking administration and troubleshooting

  • Experience in Python, Perl, or another scripting language
  • Experience in Systems engineering, site reliability engineering, building and operating systems at scale
  • This role requires you to be a national of an EU member state
  • Able to lead the creation, revision, and/or improvement of standard operational procedures (SOPs) and driving operational best practices., Experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar).
  • Experience actively mentoring junior engineers and working cross-organizationally and leading strategic team efforts requiring work from multiple team members
  • Experience operating 24x7 high-availability, distributed software applications and performance tuning software applications and optimizing fleet utilization
  • Experience with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar)
  • Experience with CI/CD pipelines, DevOps practices, and Generative AI technologies, including automated deployment, configuration management, continuous integration workflows, prompt engineering, model deployment, and AI-powered automation tools

About the company

Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.

Apply for this position