Site Reliability Engineer (Database)
Role details
Job location
Tech stack
Job description
The Site Reliability Engineer (Database) plays a critical role in maintaining the performance, availability, and scalability of Intelerad's mission-critical healthcare imaging platforms.
This position combines deep technical expertise in system administration, database performance optimization, and infrastructure to ensure our PACS, RIS, and enterprise imaging solutions deliver the reliability that healthcare providers depend on 24/7.
The SRE (Database) will provide advanced knowledge in database performance tuning, system reliability engineering, and automated deployment practices. The role takes the lead in maintaining and optimizing complex database environments, ensuring consistent performance, stability, and operational health across mission-critical systems. It also places a strong emphasis on ongoing support activities, including proactive monitoring, incident response, routine maintenance, and database housekeeping, to prevent performance degradation and ensure long-term system integrity., * Ensure high system reliability and performance across production environments by proactively monitoring infrastructure health, identifying bottlenecks, and implementing solutions that support 99.9%+ uptime for mission-critical healthcare imaging systems.
- Continuously monitor customer databases to detect issues, performance degradation, and anomalies; maintain dashboards, alerts, and tuning strategies; and support incident response and root-cause analysis for database-related events.
- Optimize SQL database performance by analyzing execution plans, implementing indexing strategies, tuning queries, and performing maintenance routines to ensure fast and reliable access to imaging data across multiple SQL Server applications.
- Lead deployment rollouts and system migrations from planning through execution, ensuring smooth transitions and thorough validation.
- Provide expertise in capacity planning and growth forecasting to maintain system scalability and stability.
- Drive effective incident management by diagnosing complex database and system issues, coordinating resolution across teams, performing root-cause analysis, and implementing preventive measures to reduce recurrence.
- Promote continuous improvement by identifying automation opportunities, enhancing monitoring and alerting, documenting system configurations and procedures, and working with development teams to improve application reliability and performance.
Requirements
Do you have experience in Terraform?, Do you have a Bachelor's degree?, * 5+ years of expert-level experience SQL, database engineering, Database reliability engineering or similar technical operations roles supporting enterprise production environments.
- Strong Sybase/SQL Server experience including performance tuning, query optimization, index management, backup/recovery procedures, and database maintenance in production environments.
- Experience with SQL Server high availability solutions (Always On, clustering, replication)
- Proficiency with Windows and/or Linux server administration, including scripting and automation (PowerShell, Bash, Python)
- Experience with monitoring and observability tools (PRTG monitor, Prometheus, Grafana, Splunk, DataDog, or similar)
- Strong troubleshooting and analytical skills with ability to diagnose complex technical issues under pressure
- Excellent communication skills with ability to collaborate across technical and non-technical teams
- Bachelor's degree in Computer Science, Information Technology, or equivalent experience
Preferred Qualifications & Special Requirements
- Experience with healthcare IT systems, particularly PACS, RIS, or medical imaging platforms
- Knowledge of healthcare data standards and compliance requirements (HIPAA, DICOM, HL7)
- Experience with infrastructure-as-code tools (Terraform, CloudFormation, ARM templates)
- Familiarity with containerization technologies (Docker, Kubernetes)
- Understanding of DevOps practices and CI/CD pipelines
- ITIL 4 Foundation or equivalent
- Cloud certifications such as AWS Solutions Architect, Azure Administrator, or Google Cloud Professional.
- Familiar with cloud platforms (AWS, Azure, or GCP) including compute services, storage solutions, networking, and cloud-native monitoring tools.
- Ability to participate in on call
- Flexibility to respond to critical incidents outside standard business hours
Travel Requirements
- Occasional travel may be required for client escalations or team collaboration (up to 10%)