System Engineer

Starhub Ltd
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Tech stack

Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Systems Engineering
BASIC (Programming Language)
Big Data
Cloud Computing
Computer Networks
Data as a Services
Information Engineering
Data Governance
ETL
Linux
DevOps
Disaster Recovery
Distributed Systems
DNS
Hadoop
Identity and Access Management
Routing
Performance Tuning
Reliability Engineering
Prometheus
Data Streaming
Systems Architecture
TCP/IP
Load Balancing
Data Ingestion
System Availability
Grafana
Spark
Reliability of Systems
Firewalls (Computer Science)
Amazon Web Services (AWS)
Information Technology
Patch Management
Data Management
Cloudwatch

Job description

As a System Engineer, you will operate large-scale big data platforms across hybrid (on-premises and cloud) environments, enabling reliable analytics and data-driven use cases. You will work closely with data engineers, data scientists, infrastructure, security, and business stakeholders to ensure data quality, platform stability, and operational excellence.This role focuses on building, running, and optimizing production-grade data platforms and pipelines, with strong ownership of infrastructure, automation, reliability, and operations., * Design, implement, and manage scalable data platform infrastructure and pipelines across on-premises and cloud environments.

  • Own the end-to-end platform lifecycle, including architecture design, deployment, operations, performance optimization, and reliability engineering.
  • Maintain and support data platform clusters and nodes (compute, storage, networking), ensuring high availability and optimal performance.
  • Provision, configure, and manage cloud-based data services such as AWS S3, Redshift etc.
  • Monitor platform health, performance, and capacity; implement observability, alerting, and operational runbooks to ensure system reliability.
  • Support and optimize ETL/ELT pipelines, ensuring reliable data ingestion, transformation, and delivery.
  • Operate and maintain data storage platforms (on-premises and cloud), ensuring durability, scalability, and cost efficiency.
  • Implement and enforce security best practices, including IAM, VPC configurations, encryption, backup strategies, and disaster recovery.
  • Ensure compliance with data governance and regulatory requirements (e.g., PDPA) in collaboration with infrastructure and security teams.
  • Collaborate with data engineers, data scientists, and cross-functional stakeholders to align platform capabilities with business and analytical needs.
  • Develop and maintain technical documentation, including system architecture, data flows, configurations, and operational procedures.

Requirements

  • Bachelor's degree in computer science, Information Technology, Engineering, or a related field, or equivalent practical experience.
  • 2-5+ years of experience in System Engineering, Infrastructure Engineering, DevOps, or Data Engineering.
  • Strong hands-on experience managing Linux-based systems, including configuration, patching, performance tuning, and troubleshooting.
  • Experience supporting on-premises or cloud infrastructure (e.g., AWS), including compute, storage, and networking components.
  • Familiarity with operating distributed systems or data platforms (e.g., Hadoop, Spark, Airflow), focusing on deployment, monitoring, and troubleshooting rather than development.
  • Solid understanding of networking fundamentals, including TCP/IP, DNS, routing, firewalls, load balancing, and VPC design.
  • Knowledge of system and platform security practices, including IAM, access control, encryption, patch management, and basic compliance requirements.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK, CloudWatch) for system health, alerting, and incident response.
  • Strong troubleshooting and problem-solving skills in production environments, with the ability to diagnose issues across infrastructure, network, and platform layers, and collaborate effectively with cross-functional teams to ensure platform reliability and operations

Apply for this position