Site Reliability Engineer
Role details
Job location
Tech stack
Job description
The SRE - Database Services reports to the Manager of the SRE Department.
In this role, you will apply software engineering principles to ensure the availability, performance and stability of TFS's enterprise database systems and platforms.
You will play a key role in maintaining and modernizing our database platforms including Exadata, Cloud Datawarehouse, AWS RDS, Oracle, Postgres and SQL DBs.
What you'll be doing
- Operate, Optimize, Monitor, and Scale database platforms including: Exadata, Oracle, AWS RDS, Cloud Datawarehouse.
- Manage and support large-scale Oracle Database and Exadata environments, ensuring uptime and performance SLAs are consistently met.
- Perform database performance tuning, capacity planning, backup/recovery.
- Troubleshoot complex production issues and implement permanent fixes to improve reliability.
- Build and Maintain components required to Automate operational workflows and reduce toil using Python or equivalent scripting language.
- Reduce manual toil by automating database maintenance, patching, and monitoring tasks.
- Design and implement observability solutions in Dynatrace, including dashboards, metrics, alerts, and anomaly detection for database workloads.
- Partner with engineering and application teams to optimize queries, indexes, and data models.
- Implement and manage HA/DR strategies, replication, and backup recovery solutions across Exadata and SQL Server environments.
- Participate in capacity planning, disaster recovery, and business continuity exercises.
- Define and manage SLIs/SLOs, health checks, and automated remediation processes
- Collaborate across teams to ensure service reliability, deployment hygiene, and operational readiness
- Work on Incident Postmortems and coordinate to implement required fixes to avoid repetitive incidents.
- Participate in on-call rotations, Major Incident Restoration.
Requirements
The Toyota Financial Services Technology Operations Center is looking for a passionate and highly motivated Site Reliability Engineer (SRE) - Database Services., * Bachelor's degree in information technology or related field.
- Solid understanding of SRE concepts: SLIs, SLOs, error budgets, incident response.
- 5+ years of hands-on experience with Oracle Database (11g/12c/19c) and Exadata administration.
- Strong experience with RMAN, Data Guard, RAC, ASM, and Exadata features.
- Working knowledge of MS SQL Server administration, clustering, and T-SQL performance tuning.
- Proficiency in Python or equivalent scripting for automation
- Experience with observability tools: preferred Dynatrace.
- Strong understanding of SRE principles (SLIs/SLOs, error budgets, observability, toil reduction).
Added bonus if you have
- Certifications like AWS Certified DevOps Engineer, AWS Certified Solutions Architect.
- Oracle DB Certifications
- AWS Certification.
- Familiarity with DevOps practices and tools like Jenkins, Docker, and Kubernetes.
Benefits & conditions
Professional growth and development programs to help advance your career, as well as tuition reimbursement Team Member Vehicle Purchase Discount Toyota Team Member Lease Vehicle Program (if applicable) Comprehensive health care and wellness plans for your entire family Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute Paid holidays and paid time off Referral services related to prenatal services, adoption, childcare, schools and more Tax Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA) Relocation assistance (if applicable)
Belonging at Toyota
Our success begins and ends with our people. We embrace all perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong.