SRE Production Support Engineer

TCS Inc
Malvern, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Malvern, United States of America

Tech stack

Agile Methodologies
Amazon Web Services (AWS)
Apache HTTP Server
Tomcat
AWK (Programming Language)
Azure
Bash
Oracle WebLogic Server
Cloud Computing
Databases
Cron
Linux
DevOps
File Systems
DNS
Middleware
Monitoring of Systems
WildFly (JBoss AS)
PostgreSQL
Linux System Administration
Log Analysis
Maven
Microsoft SQL Server
MySQL
Nagios
Networking Basics
Nginx
Oracle Applications
RabbitMQ
Ansible
Prometheus
Shell Script
Software Deployment
SQL Databases
TCP/IP
Sed (Programming Language)
Load Balancing
Grafana
GIT
Kubernetes
Performance Monitor
Kafka
Grep
Cloudwatch
Terraform
Splunk
Appdynamics
Dynatrace
Docker
Jenkins

Job description

  • Provide L2/L3 production support for mission-critical applications.
  • Monitor application health, infrastructure, and platform performance.
  • Respond to production incidents and restore services within SLA.
  • Perform root cause analysis (RCA) and implement permanent fixes.
  • Troubleshoot application, middleware, infrastructure, and database issues.
  • Participate in 24x7 on-call production support rotation.
  • Automate repetitive operational tasks using scripting.
  • Monitor application logs and system metrics.
  • Collaborate with development teams to improve application reliability.
  • Support production deployments and release activities.
  • Maintain operational runbooks and support documentation.
  • Implement proactive monitoring and alerting.
  • Participate in problem management and continuous improvement initiatives.

Requirements

Production Support

  • 5+ years of Production Support/SRE experience
  • Incident Management
  • Problem Management
  • Change Management
  • Root Cause Analysis (RCA)
  • Release Support
  • Production Deployments

Linux

  • Linux Administration
  • Shell Scripting
  • Bash
  • Process Management
  • File Systems
  • Cron Jobs
  • Log Analysis
  • grep
  • awk
  • sed
  • tail
  • journalctl

Monitoring & Observability

  • Splunk
  • Dynatrace
  • AppDynamics
  • Grafana
  • Prometheus
  • ELK
  • CloudWatch

Cloud

  • AWS
  • Azure (preferred)
  • Kubernetes
  • Docker

DevOps

  • Jenkins
  • Git
  • Maven
  • Ansible
  • Terraform
  • CI/CD Pipelines

Middleware

  • Apache
  • Nginx
  • Tomcat
  • WebLogic
  • JBoss

Databases

  • Oracle
  • SQL Server
  • PostgreSQL
  • MySQL
  • SQL

Messaging

  • Kafka
  • MQ
  • RabbitMQ, * Experience supporting high-availability production environments.
  • Experience working in an SRE or Production Support role.
  • Strong Linux troubleshooting skills.
  • Experience with monitoring and alerting tools.
  • Experience analyzing application and server logs.
  • Knowledge of networking fundamentals (TCP/IP, DNS, Load Balancers).
  • Experience with automation and scripting.
  • Experience with ITIL processes.
  • Experience working in Agile environments.

Apply for this position