System Engineer

Starhub Ltd

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Tech stack

Airflow

Amazon Web Services (AWS)

Systems Engineering

BASIC (Programming Language)

Big Data

Cloud Computing

Computer Networks

Data as a Services

Information Engineering

Data Governance

ETL

Linux

DevOps

Disaster Recovery

Distributed Systems

DNS

Hadoop

Identity and Access Management

Routing

Performance Tuning

Reliability Engineering

Prometheus

Data Streaming

Systems Architecture

TCP/IP

Load Balancing

Data Ingestion

System Availability

Grafana

Spark

Reliability of Systems

Firewalls (Computer Science)

Amazon Web Services (AWS)

Information Technology

Patch Management

Data Management

Cloudwatch

Job description

As a System Engineer, you will operate large-scale big data platforms across hybrid (on-premises and cloud) environments, enabling reliable analytics and data-driven use cases. You will work closely with data engineers, data scientists, infrastructure, security, and business stakeholders to ensure data quality, platform stability, and operational excellence.This role focuses on building, running, and optimizing production-grade data platforms and pipelines, with strong ownership of infrastructure, automation, reliability, and operations., * Design, implement, and manage scalable data platform infrastructure and pipelines across on-premises and cloud environments.

Own the end-to-end platform lifecycle, including architecture design, deployment, operations, performance optimization, and reliability engineering.
Maintain and support data platform clusters and nodes (compute, storage, networking), ensuring high availability and optimal performance.
Provision, configure, and manage cloud-based data services such as AWS S3, Redshift etc.
Monitor platform health, performance, and capacity; implement observability, alerting, and operational runbooks to ensure system reliability.
Support and optimize ETL/ELT pipelines, ensuring reliable data ingestion, transformation, and delivery.
Operate and maintain data storage platforms (on-premises and cloud), ensuring durability, scalability, and cost efficiency.
Implement and enforce security best practices, including IAM, VPC configurations, encryption, backup strategies, and disaster recovery.
Ensure compliance with data governance and regulatory requirements (e.g., PDPA) in collaboration with infrastructure and security teams.
Collaborate with data engineers, data scientists, and cross-functional stakeholders to align platform capabilities with business and analytical needs.
Develop and maintain technical documentation, including system architecture, data flows, configurations, and operational procedures.

Requirements

Bachelor's degree in computer science, Information Technology, Engineering, or a related field, or equivalent practical experience.
2-5+ years of experience in System Engineering, Infrastructure Engineering, DevOps, or Data Engineering.
Strong hands-on experience managing Linux-based systems, including configuration, patching, performance tuning, and troubleshooting.
Experience supporting on-premises or cloud infrastructure (e.g., AWS), including compute, storage, and networking components.
Familiarity with operating distributed systems or data platforms (e.g., Hadoop, Spark, Airflow), focusing on deployment, monitoring, and troubleshooting rather than development.
Solid understanding of networking fundamentals, including TCP/IP, DNS, routing, firewalls, load balancing, and VPC design.
Knowledge of system and platform security practices, including IAM, access control, encryption, patch management, and basic compliance requirements.
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK, CloudWatch) for system health, alerting, and incident response.
Strong troubleshooting and problem-solving skills in production environments, with the ability to diagnose issues across infrastructure, network, and platform layers, and collaborate effectively with cross-functional teams to ensure platform reliability and operations