Report Number: CSL-TR-81-214
Institution: Stanford University, Computer Systems Laboratory
Title: An exponential failure/load relationship: results of a multi-computer statistical study
Author: Iyer, Ravishankar K.
Author: Butner, Steven E.
Author: McCluskey, Edward J.
Date: July 1981
Abstract: In this paper we present an exponential statistical model which relates computer failure rates to level of system activity. Our analysis reveals a strong statistical dependency of both hardware and software component failure rates on several common measures of utilization (specifically CPU utilization, I/O initiation, paging, and job-step initiation rates). We establish that this effect is not dominated by a specific component type, but exists across the board in the two systems studied. Our data covers three years of normal operation (including significant upgrades and reconfigurations) for two large Stanford University computer complexes. The complexes, which are composed of IBM mainframe equipment of differing models and vintage, run similar operating systems and provide the same interface and capability to their users. The empirical data domes from identically-structured and maintained failure logs at the two sites along with IBM OS/VS2 operating system performance/load records The statistically strong relationship between failures and load is evident for many equipment types, including electronic, mechanical, as well as software components. This is in opposition to the commonly-held belief that systems which are primarily electronic in nature exhibit no such effect to any significant degree. The exponential character of our statistical model is significantly not only in its simplicity, but also due to its compatibility with classical reliability techniques.