|
Lets Give Availability Back to the Engineers S. A. Hodson, Intercai Mondiale Availability (with a capital 'A') is a well-defined and long-established engineering parameter. It even has a British Standard definition relating to the percentage of time that a system is available (with a small 'a') to be used (i.e., is not faulty). With the advent of modern digital systems, new issues arise that can prevent a system from being used for reasons other than that the fact that they are faulty, and the term Availability has been "hijacked" by the Service Level Agreement (SLA) industry to include these periods. This is leading to extremely muddled thinking because the word Availability, used in this way, combines at least three different, independent, mechanisms and this prevents any attempt at rational analysis or mathematical description of the actual condition. Worse, it muddles the responsibilities for corrective action and perpetuates a sloppy approach to engineering design. I find this unprofessional approach deplorable. This short paper explores the different mechanisms of Availability and proposes an approach that does offer the ability to analyse what is going on and describe the mechanisms mathematically. I believe that the term Availability should revert to its classical meaning and, for the avoidance of doubt, I propose a supplementary definition stating that Availability is the proportion of time during which a system operates as its designers intended (i.e., it is not faulty and needs no repair). Usability versus Availability We now need to introduce a new term to cover periods when a system is Available (according to the preceding definition) but still does not provide the expected service to the user, for whatever reason. We propose to introduce the term "Usable." A system is Usable when it is delivering the service, at the levels of performance prescribed, that the user expects. The parameter that measures this characteristic is "Usability." Usability is a derived parameter that combines at least three mechanisms that can cause a system to become unusable. Availability Overload System
Complexity Operator
Error Change procedures should ensure that if there is a consequential effect, in spite of all the precautions, there is the ability to return to the status quo to enable the system to continue running while the reasons for the effect are established. Now is the time to reduce the incidence of operator error as a cause of system unusability to an insignificant level (and incidentally delete a whole layer of administrative software aimed at detecting and diagnosing such events). Good Definition, Good Design Let us give Availability back to the engineers and if we must have a catch-all parameter, then introduce Usable, which itself depends on availability, load versus capacity and (if we must) operator error. We must remember that these three mechanisms are entirely unrelated and independent and any one of them could swamp the others in terms of delivered performance. Let us also stamp on the sloppy approach to engineering design that encourages us to accept 'operator error' as a fact of life that has to be lived with, rather than an entirely avoidable consequence of over-complexity and inadequate preparation for entry into service. Do
you agree or disagree with this article? |
||
| Copyright (c) 2000-2003, nextslm.org. All Rights Reserved. Legal Statement. | ||