Mihaela-Roxana Ghidersa

System Resilience: Surviving the Software Storm

What if the most important pillar of system resilience isn't your architecture, but your culture? Learn to build systems that withstand any software storm.

System Resilience: Surviving the Software Storm
#1about 3 minutes

The business necessity of system resilience

An e-commerce site failure during a Black Friday sale illustrates how downtime leads to financial loss and why resilience is essential.

#2about 5 minutes

Understanding faults, failures, and tolerance mechanisms

A fault is a latent bug in the code, while a failure is the observable crash it causes, which can be mitigated with fault tolerance and fail-safe systems.

#3about 5 minutes

Navigating the challenges of modern software development

Modern systems face challenges from increasing complexity, evolving technology, and high user expectations, requiring a balance to avoid over-engineering.

#4about 3 minutes

Building resilience across all software stack layers

True resilience requires a holistic approach that addresses the infrastructure, application, and database layers, as well as the crucial human layer of team culture.

#5about 4 minutes

Core strategies for building resilient systems

Key architectural strategies for resilience include implementing redundancy, failover mechanisms, load balancing, and using availability zones or microservices.

#6about 5 minutes

Implementing disaster recovery and secure coding practices

Proactive resilience involves creating a disaster recovery plan through risk assessment and empowering developers to contribute through secure coding practices.

#7about 7 minutes

Using monitoring and continuous testing for improvement

A continuous improvement cycle is driven by monitoring system health, using automated testing to catch issues early, and analyzing failures to learn from them.

#8about 2 minutes

A practical starting point for individual developers

Developers can significantly impact resilience by focusing on core software quality attributes like performance, security, scalability, and maintainability.

#9about 3 minutes

Adopting a proactive mindset for future resilience

The future of resilience lies in a proactive approach, embracing innovations like AI for predictive failure analysis and fostering a culture of continuous adaptation.

#10about 4 minutes

Balancing security practices with system performance

Security and performance are not a trade-off but a balance that must be determined by the specific context and priorities of the system.

#11about 4 minutes

Prioritizing components when designing for resilience

Focus resilience efforts on foundational components like infrastructure and architecture, as these "shearing layers" are the most difficult and costly to change later.

#12about 5 minutes

Communicating the value of resilience to stakeholders

To get buy-in from decision-makers, present a data-driven business case that clearly documents the financial losses and risks associated with poor resilience.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

From learning to earning

Jobs that call for the skills explored in this talk.

Cloud Engineer (m/w/d)

Cloud Engineer (m/w/d)

fulfillmenttools
Köln, Germany

50-65K
Intermediate
TypeScript
Google Cloud Platform
Continuous Integration
Rust and GoLang

Rust and GoLang

NHe4a GmbH
Karlsruhe, Germany

Remote
55-65K
Intermediate
Senior
Go
Rust
Software Engineer

Software Engineer

tree-IT GmbH
Bad Neustadt an der Saale, Germany

Remote
54-80K
Intermediate
Senior
Java
TypeScript
Spring Boot