Kafka Platform Engineer

ITBrainiac Inc
Denver, United States of America
3 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Denver, United States of America

Tech stack

Amazon Web Services (AWS)
Cloud Computing
Linux
DevOps
Disaster Recovery
DNS
Java Management Extensions
Performance Tuning
Runbook
Transmission Control Protocol (TCP)
Scripting (Bash/Python/Go/Ruby)
Load Balancing
Grafana
Kafka
Confluent

Job description

Kafka Platform Engineer Apache Kafka, Confluent Platform, Kafka Connect, Schema Registry, Linux, DevOps/SRE, AWS MSK, We re seeking a senior contract Kafka/Confluent administrator to own and evolve our on-prem event streaming platform, with a primary focus on Confluent Platform. You will lead planning and execution of a hardware refresh for our on-prem clusters, drive reliability and performance, and embed DevOps/automation across provisioning, deployment, observability, and incident response. Experience with Apache Kafka and AWS MSK is desired for secondary support and cross-environment alignment. Comprehensive documentation and runbooks are required deliverables.

Kafka Platform Support Key Responsibilities

Design, deploy, and operate highly available Kafka clusters (on-prem, cloud, and/or managed services such as Confluent Cloud or AWS MSK).

Manage topics, partitions, quotas, retention policies, and consumer group strategies for performance and cost.

Own upgrades, patches, and migrations.

Implement and manage Kafka components: Kafka Connect, Schema Registry, MirrorMaker/Confluent Replicator, REST Proxy; familiarity with Kafka Streams and ksqlDB is a plus.

Performance tuning (producers/consumers, batching, compression, acks, ISR, controller health), throughput testing, and benchmarking.

Capacity planning, partitioning strategy, and cluster right-sizing.

Contract Deliverables

Hardware refresh plan: capacity model, sizing, architecture diagrams, migration/cutover strategy, risk register

Implement and validated on-prem clusters on refreshed hardware with performance benchmarks

Operational documentation: standards, runbooks, monitoring/alerts configuration, backup/restore and DR playbooks.

Knowledge transfer sessions and documentation handoff at milestones and project close.

Requirements

Must have deep, handson experience running Kafka in largescale production environments, including cluster operations, upgrades, patches, and migrations.

Should understand Kafka internals such as partitions, replication, retention/compaction, and rebalance strategies.

Kafka Administration

Platform / SRE / DevOps Experience

Kafka Ecosystem Tools

Linux + Networking

Automation / Scripting

Monitoring / Observability

Disaster Recovery

Nice to Have Skills:

AWS MSK / Apache Kafka Cloud: Experience with MSK operations and cloudaligned Kafka environments.

Helpful for crossenvironment consistency between onprem and cloud.

Hardware Refresh Experience: Prior work leading Kafka hardware refreshes or cluster rebuilds., 5+ years in systems/platform engineering, SRE, or DevOps; 4+ years operating Kafka in production at scale.

Deep knowledge of Kafka internals: partitions, replication, retention/compaction, rebalance strategies.

Hands-on with Kafka Connect, Schema Registry, MirrorMaker/Confluent Replicator.

Strong Linux fundamentals; networking (TCP, DNS, load balancing), and performance analysis.

Proficiency in automation/scripting.

Monitoring/observability: Data Dog, Grafana, JMX exporters, and log aggregation.

Experience with DR, multi-region design, and incident management.

Proven ability to produce clear, comprehensive documentation

Preferred Qualifications

Experience with Apache Kafka and AWS MSK operations and integration.

Experience executing hardware refreshes mor major cluster rebuilds/migrations with minimal downtime.

Best regards

Apply for this position