Infrastructure Engineer III
Role details
Job location
Tech stack
Job description
- Applies technical knowledge and problem-solving methodologies to projects of moderate scope, with a focus on improving the data and systems running at scale, and ensures end to end monitoring of applications
- Resolves most nuances and determines appropriate escalation path
- Executes conventional approaches to build or break down technical problems
- Drives the daily activities supporting the standard capacity process applications
- Partners with application and infrastructure teams to identify potential capacity risks and govern remediation statuses
- Considers upstream/downstream data and systems or technical implications
- Be accountable for making significant decisions for a project consisting of multiple technologies and applications
- Adds to team culture of diversity, opportunity, inclusion, and respect
Requirements
- Formal training or certification in Infrastructure Engineering concepts and 5+ years applied experience
- Extensive experience running on-prem GemFire or comparable distributed caches/data grids at scale; hands-on with locators, servers, regions, WAN gateways, client connectivity, rolling upgrades, and patching in controlled change processes.
- Production experience with Valkey/Redis (cluster mode, replication, persistence via RDB/AOF); capacity and performance management for low-latency use cases .
- Strong Linux and networking skills, including tuning OS/network parameters (file descriptors, TCP keepalive, MTU, buffers) and understanding DNS and load balancing in enterprise environments.
- Experience designing dashboards/alerts in Splunk/Dynatrace or equivalent for JVM/GC, heap/off-heap, queue depth, client connections, and WAN metrics; ability to partner across application and infrastructure teams to identify capacity risks and govern remediation.
- Experience tuning JVM/GC and GemFire heap/off-heap memory models; troubleshooting region rebalancing, partitioning, and gateway queues for cross-site replication.
Preferred qualifications, capabilities, and skills
- TigerGraph: Production operations for TigerGraph clusters including schema design and GSQL optimization; controlled patching/rolling upgrades; monitoring and capacity management via enterprise observability; on-call incident response, root-cause analysis, and tuning for low-latency SLAs.
- MongoDB: Administer replica sets and sharded clusters on-prem with scaling and version upgrades through change governance; performance and resiliency via index/schema design, query plan analysis, server parameter tuning; robust backup/restore with PITR and DR runbooks; enterprise security hardening and integrated observability.
- MongoDB Atlas: Operate Atlas clusters with right-sizing and autoscaling policies, multi-region/global configurations, and maintenance within change controls; configure security and networking (private endpoints/peering, RBAC, API governance, encryption); implement cost/capacity guardrails; manage snapshots and point-in-time restore with periodic DR testing.
Benefits & conditions
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.