Senior Data Engineer, Scala
Role details
Job location
Tech stack
Job description
We are seeking someone with proficiency in Scala and Spark to build and optimize large-scale batch and streaming data pipelines.
The Core Data Engineering team sits within the data engineering group and maintains technical involvement across the applications and infrastructure that make up Magnite's data pipelines. Our pipelines collectively handle 400+ billion events per day and generate terabytes of data per hour across both cloud-based and on-prem infrastructure. This data underpins the various business lines - including client reporting, internal data science, account managers, and product + business teams - and as such we need to build systems that remain scalable and efficient at this volume of data while also ensuring data consistency and reliability. We value communication, discussion, and sharing of ideas to come to the best technical solutions to our large-scale data challenges. We are looking for people who want to get things done and value open collaboration (including constructive feedback when brainstorming). The team's mandate is technical development across the three platform-specific data engineering teams in the group. As such they get exposure to all aspects of our data engineering infrastructure and applications including Spark jobs, Java-based real-time event processing, and large-scale data warehousing.
In this role you will:
- Get to work on handling internet-scale data problems
- Help architect and build systems to process our data volume to empower all consumers
- Have hands-on involvement across the group's various data pipelines and related systems for data delivery
- Be a part of and promote our culture of collaboration and mentorship
Typical challenges we face in this role include:
- Highly-scalable infrastructure: Our traffic has patterns with peak- and off-peak times and seasonal shifts, and our data infrastructure needs to be able to respond accordingly
- Cost optimization: Improving profit margins by lowering the infrastructure cost basis through building efficient systems
- Technical architecture considerations given various SLAs for data delivery, * Spark (batch + streaming) data pipelines, currently based both on Databricks and on-prem Spark infrastructure, primarily using Scala
- Streaming event processing data pipeline, using Java
- Terraform, Docker, Jenkins for CI/CD / infra / application deployment
- Airflow for job orchestration
- AWS-based cloud infrastructure including RDS, EC2, S3, Kinesis, ECS
Requirements
- Proficiency in Scala and Spark to build and optimize large-scale batch and streaming data pipelines
- With 5+ years of software development experience with a BS/MS Computer Science or equivalent work experience
- Who wants to take ownership. We are responsible for the entire software development lifecycle, from requirements gathering to production support
- To engage in critical and creative thinking, and constructive brainstorming. We expect engineers to present and discuss tradeoffs to solve the problems we face.
- With either existing experience designing and building systems that work with large-scale data volumes and data ingestion systems at scale in a cloud-first setting, OR established technical excellence with a desire to learn and crush the data engineering world
Benefits & conditions
- Comprehensive Healthcare Coverage for You and Your Family from Day One
- Generous Time Off
- Holiday Breaks, Summer Fridays and Quarterly Wellness Days
- Equity and Employee Stock Purchase Plan
- Family-Focused Benefits and Parental Leave
- 401k Retirement Savings Plan with Employer Match
- Disability and Life Insurance
- Cell Phone Subsidy
- Fitness Reimbursement
Company Culture:
- Community Service and Volunteer Events
- Company-Matched Charitable Contributions
- Wellness Coach and Mental Health Support
- Career Development Initiatives and a Career Growth Framework
- Culture and Inclusion Programs
- Bonusly Peer-to-Peer Recognition Program