Availability
A couple of parameters:
- Mean Time to Repair (): time between the occurrence of a fault and service recovery, also known as the mean downtime.
- Mean Time To Failures (): time between the recovery from one incident and the occurrence of the next incident, also known as up-time.
- Mean Time Between Failures (): Mean time between the occurrences of two consecutive incidents
Then we can define availability as:
The probability that a component is working properly at time (actual uptime).
Reliability: means that the service is available for an agreed period without interruptions (frequency of interruptions).
Note that a system could be available for 99% of time but have a lack of reliability since it continues to ‘crash’ and quickly restart.
Solve availability in practice
Simple rules:
- Components in series, so the total availability is the multiplication of all the components availability.
- Components in parallel:
- The main rule to increase the availability of a system is to add in parallel another component of the one with lesser availability.
- We repeat this process until we obtain the desired availability.