Software Triage Engineer
Role details
Job location
Tech stack
Job description
The Release Triage Software Engineer function is critical to the successful development and deployment of new features and fixes as the function ensures that these changes get deployed to the field.
Sky's development environment is advanced and highly integrated. It uses industry-standard tools that are combined effectively to support a fast-moving, agile development cycle. The combination of these tools running on cloud infrastructure, coupled with effective use of Open-Source code, allows Sky to deliver features and products against aggressive timelines.
What You'll Do
As a key engineer of the team you will be responsible for the release quality and triage lifecycle, which includes deployment, triage, mitigation, and tool development for software release operations. The position will require collaboration with the Development, Release, and QA teams. You will assess and ensure the release quality of the RDK software with Key performance metrics as well as incidents from the field. Also, identify new tools, processes, etc., necessary to improve the software release triage engineering process. You will manage risks and resolve issues that affect release scope, schedule and quality.
Your Daily Tasks
- Ensure timely and high-quality software releases across diverse devices by proactively monitoring metrics and alerting systems. Promptly respond to critical field issues, identify root causes, and implement effective mitigation strategies.
- Troubleshoot E2E issues in entertainment devices across various RDK middleware components, including media player, audio/video streaming protocols, web browser, HDMI, Bluetooth, and WiFi/Ethernet. Conduct source code reviews to identify root causes within the middleware and platform.
- Diagnosing and resolving issues using Linux systems and networking protocols, including packet capture analysis
- Design and enhance operational tools and architect DevOps solutions to optimize system performance and efficiency.
- Leverage AWS technologies (S3, Athena, QuickSight) to analyse data from millions of field devices, delivering insights to inform decision-making and drive operational efficiency.
- Develop and implement anomaly detection techniques and data-driven solutions to proactively identify and resolve system issues. Perform global metric comparisons across various device models.
Requirements
- You will be skilled in C/C++, Python, and Linux .
- Ideally you'll also have experience with log management and analysis tools such as Elastic Stack (ELK), Splunk, and Grafana for data visualisation and monitoring.
- Proven expertise in at least one scripting language, such as Bash, Python, or Go.
- Ability to make good technical decisions and convince others about the merits and reasons for those decisions.
- Experienced in Defect Tracking Tools such as Jira
- SCM Tools - Git & GitHub