Backend Engineer (EST timezone)
Role details
Job location
Tech stack
Job description
We use a mixture of Node.JS and Rust for high-throughput processing. We store most of our data in Kafka, PostgreSQL, Clickhouse, S3, and Redis, but with the growing volume of data, we're constantly re-evaluating our technological choices. We're looking for someone who understands the principles of designing distributed systems and can use them to pick the best tools for the job. What you'll be doing
You'll help build PostHog's observability suite: Logs (live and growing fast), Traces (in alpha), and Metrics (landing soon); the products that let our customers, and their AI agents, understand, debug, and self-heal their own software. This is the foundation for self-driving, self-healing products, and we're building most of it from scratch. The core challenge is easy to say and hard to do: ingest, store, and retrieve enormous volumes of telemetry; fast, reliably, and cost-effectively. Getting data in is the easy half; getting it back out efficiently at petabyte scale, without melting the infra bill, is the real game. We're already handling terabytes of data and it's growing!
Requirements
We're seeking a backend engineer engineer for our APM team who thrives on challenges of building systems that process Petabytes of data. Someone who gets excited about designing elegant and efficient systems that can handle this amount of data without giving people insomnia. A strong engineer that understands the importance of data integrity and reliability for customers
The ideal candidate has experience with high-throughput data processing systems such as:
- Observability platforms & Open Telemetry instrumentation
- Metric collection systems
- Log aggregation engines
- Streaming and batch-processing pipelines, * Experience with highly scalable, event-driven distributed systems
- Strong across the full data lifecycle at scale, ingestion and efficient, cost-aware storage & retrieval (query/storage performance matters as much as throughput)
- Experience with Node.JS, Go, Rust, or similar
- You've worked at scale with systems like Kafka, ClickHouse, PostgreSQL, Redis, or S3
- You can take an ambiguous, greenfield problem, frame it properly, and drive it forward without hand-holding
- You've worked with multi-tenant SaaS
- You ship changes quickly without breaking things
Nice to have
- Knowledge of observability systems & practices - OpenTelemetry, and the realities of logs/metrics/traces at scale
- Experience with high-throughput log aggregation, metric collection, or tracing systems
- You've worked on call and dealt with incidents
- Comfortable provisioning and maintaining cloud infrastructure
- Experience with benchmarking and profiling tools