Sr. Site Reliability Engineer, Data - FreeWheel
Role details
Job location
Tech stack
Job description
FreeWheel is seeking an experienced Data SRE to join the FreeWheel Data SRE team. As a member of the Global Operation team, you will be responsible for ensuring the reliability, scalability, and performance of our data systems. Working closely with data engineers and other operation sub-teams, you will manage our data infrastructure, optimize system reliability, automate daily operations, and resolve technical issues that impact our data pipelines and backend data platforms., System Monitoring and Optimization
- Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms.
- Quickly respond to and resolve issues impacting data pipelines or storage layers.
Automation and Tool Development
- Develop and maintain automation tools and scripts for deployment, monitoring, backup, recovery, and disaster recovery of data systems.
Performance Optimization
- Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets.
- Reduce latency and improve processing speed.
Incident Response and Troubleshooting
- Respond quickly to data platform failures.
- Perform troubleshooting and coordinate cross-team efforts to resolve issues.
- Ensure high availability and reliability of data platforms.
Capacity Planning and Scaling
- Work with data engineering teams to analyze and forecast capacity requirements.
- Ensure systems can accommodate data growth and scale infrastructure accordingly.
Documentation and Knowledge Sharing
- Document the architecture, configurations, and operational procedures for data platforms.
- Share knowledge across the team and provide relevant training.
Security and Compliance
- Ensure data platforms meet security standards and compliance requirements.
- Prevent data breaches, unauthorized access, and data misuse.
Cross-Team Collaboration
- Collaborate with data science, product, and development teams.
- Support data product design and implementation.
- Resolve reliability-related issues across the platform.
Requirements
- At least 8+ years of experience as an SRE, DevOps, or Data Operations Engineer.
- Experience with cloud platforms (e.g., AWS, GCP, Azure).
- Extensive experience in database management (e.g., NoSQL databases, MySQL, PostgreSQL).
- Proficiency in automation tools and frameworks (e.g., Ansible, Terraform, Kubernetes, Docker) for automating data system deployment and maintenance.
- Strong experience with modern CI/CD pipelines.
- Programming skills in Python, Go, Java, or Scala with the ability to write efficient scripts and automation tools.
- Experience using monitoring and log management tools such as Prometheus, Grafana, and ELK Stack.
- Strong troubleshooting and debugging skills with the ability to quickly identify and resolve production issues.
- Excellent communication skills with the ability to convey technical information clearly to technical and non-technical stakeholders.
- Education: Bachelor's degree or higher in Computer Science, Software Engineering, or a related field.
Additional Preferred Skills
- Familiarity with containerization, microservices architecture, and Kubernetes.
- Experience designing and maintaining large-scale distributed systems.
- Experience in data quality management, data governance, or ETL pipelines
Disclaimer: This information has been designed to indicate the general nature and level of work performed by employees in this role. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities and qualifications.
Skills Amazon Web Services (AWS), Automation, Python (Programming Language), Bachelor's Degree
While possessing the stated degree is preferred, Comcast also may consider applicants who hold some combination of coursework and experience, or who have extensive related professional experience.
Benefits & conditions
Primary Location Pay Range: $117,627.86 - $176,441.80