Availability

A couple of parameters:

  • Mean Time to Repair (): time between the occurrence of a fault and service recovery, also known as the mean downtime.
  • Mean Time To Failures (): time between the recovery from one incident and the occurrence of the next incident, also known as up-time.
  • Mean Time Between Failures (): Mean time between the occurrences of two consecutive incidents

Then we can define availability as:

The probability that a component is working properly at time (actual uptime).

Reliability: means that the service is available for an agreed period without interruptions (frequency of interruptions).

Note that a system could be available for 99% of time but have a lack of reliability since it continues to ‘crash’ and quickly restart.

Solve availability in practice

Simple rules:

  • Components in series, so the total availability is the multiplication of all the components availability.
  • Components in parallel:
  • The main rule to increase the availability of a system is to add in parallel another component of the one with lesser availability.
  • We repeat this process until we obtain the desired availability.