Sr.System Development Engineer, AGI Infrastructure

Amazon.com, Inc.
Barcelona, Spain
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Barcelona, Spain

Tech stack

Java
Adobe InDesign
Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
C Sharp (Programming Language)
C++
Code Review
Computer Programming
Linux
Distributed Systems
Github
Subnetting
Python
Ruby
Software Engineering
Rust
Amazon Web Services (AWS)
Load Balancing
Delivery Pipeline
Large Language Models
Infrastructure as Code (IaC)
Amazon Web Services (AWS)
Cloudformation
Kubernetes
Route53
Terraform
Go

Job description

The Artificial General Intelligence (AGI) team is looking for passionate, talented, and inventive engineers to play a pivotal role in the development/maintenance of industry-leading multi-modal and multi-lingual large language models (LLM). AGI team's mission is to leverage our hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI foundational models that revolutionize machine perception, interpretation and interaction, with humans and with the physical world.

We believe in "Work Hard. Have Fun. Make History" by having a strong focus on sharing learning experiences from the front line with the development teams. The options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate with multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on, and bash out code to support the team, we have a spot for you too.

You will be required to deeply understand technology landscapes, and evaluate the use of new technologies. You will be influential within your team and work with peers and senior leaders to define and revise the standards for operational excellence across systems. You will consistently tackle abstract issues that span multiple functional areas and drive your team to push for improvements that can scale across other teams, services, and platforms., * Lead design, automation and improve GenAI training compute infrastructure continuously.

  • Guide/Mentor other engineers as force-multiplier to deliver results
  • Participate in design and code reviews and identify bottlenecks.
  • Identify performance bottlenecks in compute infrastructure and propose solutions to address them.
  • Candidates should be well-versed in core AWS services, including EC2, Lambda, EKS etc.
  • Experienced in setting up and managing CI/CD pipelines using tools such as AWS CodePipeline, GitHub Actions, or similar platforms.
  • Familiarity with Infrastructure as Code (IaC) tools like AWS CloudFormation, Terraform, or the AWS CDK is a valuable asset. Furthermore, an understanding of networking concepts like VPC, subnets, and security groups, as well as configuring Load Balancers and Route 53, is desirable.
  • Should have hands-on experience in Kubernetes.

Requirements

  • 6+ years of systems design, software development, operations, automation, and process improvement experience
  • Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
  • Experience with Linux/Unix
  • Experience with CI/CD pipelines build processes, * Experience with distributed systems at scale

Apply for this position