Report Number: CSL-TR-94-655
Institution: Stanford University, Computer Systems Laboratory
Title: I/O Characterization and Attribute Caches for Improved I/O System Performance
Author: Richardson, Kathy J.
Date: December 1994
Abstract: Workloads generate a variety of disk I/O requests to access file information, execute programs, and perform computation. I/O caches capture most of these requests, reducing execution time, providing high I/O rates, and decreasing the disk bandwidth needed by each workload. A cache has difficulty capturing the full range of I/O behavior, however, when it treats the requests as single stream of uniform tasks. The single stream contains I/O requests for data with vastly different reuse rates and access patterns. Disk files can be classified as accesses to inodes, directories, datafiles or executables. The combined cache behavior of all four taken together provides few clues for improving performance of the I/O cache. But individually, the cache behavior of each reveals the distinct components that make up aggregate I/O behavior. Inodes and directories turn out to be small, highly reused files. Datafiles and executable files have more diverse characteristics. The smaller ones exhibit moderate reuse and have little sequential access, while the larger files tend to be accessed sequentially and not reused. Properly used, file type and file size information improves cache performance. The dissertation introduces attribute caches to improve I/O cache performance. Attribute caches use file attributes to selectively cache I/O data with a cache scheme tailored to the expected behavior of the file type. Inodes and directories are cached in very small blocks, capitalizing on their high reuse rate, and small space requirements. Large files are cached in large cache blocks capitalizing on their sequential access patterns. Small and medium sized files are cached in average 4 kbyte blocks that minimizes the memory required to service the bulk of requests. The portion of cache dedicated to each group varies with total cache size. This allows the important features of the workload to be captured at the appropriate cache size, and increases the total cache utilization. For a set of 11 measured workloads an attribute cache scheme reduced the miss ratio 25--60\% depending on cache size, and required only about 1/8 as much memory as a typical I/O cache implementation achieving the same miss ratio.