Senior Software Engineer (Agentic Search) - Crawler

Nebius

Berlin, Germany

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Berlin, Germany

Tech stack

Automated Storage and Retrieval Systems

C++

Data Deduplication

Software Debugging

Distributed Systems

DNS

Fault Tolerance

MapReduce

Enterprise Messaging Systems

RabbitMQ

Data Streaming

Web Crawlers

Spark

Backend

Event Driven Architecture

Apache Flink

Kafka

Build Tools

Search Engines

Job description

We are looking for a Senior Software Engineer to work on the content acquisition and crawling infrastructure of a novel search engine tailored for agentic AI consumption.

In this role, you will focus on building systems that discover, fetch, and continuously refresh content from the open web and other large-scale data sources. You will design distributed crawling, scheduling, and ingestion infrastructure capable of operating at internet scale while balancing coverage, freshness, resource efficiency, and reliability. You will work on systems that process billions of URLs, manage high-throughput data flows, and ensure that high-quality content is consistently available to downstream indexing and retrieval systems.

In this position, your responsibility will be to:

Design, implement, and operate web-scale crawling systems for acquiring content from the internet
Build ingestion workflows for internal and external data sources, including crawlers, structured feeds, and partner integrations
Develop crawl scheduling, prioritisation, recrawl policies, and freshness strategies
Build systems for URL discovery, deduplication, content extraction, and crawl orchestration
Ensure reliable operation of crawling infrastructure under high-throughput conditions
Define observability and quality metrics for crawl coverage, freshness, throughput, and content quality
Monitor resource usage, bandwidth consumption, and infrastructure cost
Collaborate with indexing and ML teams to ensure acquired content meets retrieval and ranking requirements
Enable safe experimentation with crawling strategies and content acquisition policies

Requirements

Do you have experience in Spark?, * 5+ years of experience building backend or distributed systems

Strong Go or C++ expertise
Experience with large-scale distributed systems (10k+ RPS, billions of URLs, high-throughput pipelines)
Understanding of web protocols (HTTP, DNS, TLS), crawling, scraping, and content extraction
Experience operating production systems and debugging failures in distributed environments
Strong understanding of scalability, fault tolerance, and resource management

Strong candidates may also have experience with:

Web crawling
Building streaming data pipelines and event-driven systems
Kafka, Pulsar, NATS, RabbitMQ, or similar messaging platforms
Designing distributed schedulers, queues, and asynchronous processing systems
Spark, Flink, Beam, or MapReduce
Ad tech, social networks, search engines, or other large-scale content platforms, Applicants must be authorized to work in the country in which they apply and will be required to provide proof of employment eligibility as a condition of hire.

Benefits & conditions

Competitive compensation
Career growth and learning opportunities
Flexibility and ownership
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams

What's it like to work at Nebius:

Fast moving - Bold thinking - Constant growth - Meaningful impact - Trust and real ownership - Opportunity to shape the future of AI

About the company

Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D. The Product In a rapidly evolving world, trust in AI depends on AI agents being grounded in fresh, verified real-world data. Search is the foundation that makes this possible. We are building an agent-native search platform designed specifically for AI systems rather than human users. Our product provides programmatic, low-latency, and observable search APIs that AI agents use to retrieve, filter, and reason over real-world information at scale.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all