Report Number: CSL-TR-94-655
Institution: Stanford University, Computer Systems Laboratory
Title: I/O Characterization and Attribute Caches for Improved I/O
System Performance
Author: Richardson, Kathy J.
Date: December 1994
Abstract: Workloads generate a variety of disk I/O requests to access
file information, execute programs, and perform computation.
I/O caches capture most of these requests, reducing execution
time, providing high I/O rates, and decreasing the disk
bandwidth needed by each workload. A cache has difficulty
capturing the full range of I/O behavior, however, when it
treats the requests as single stream of uniform tasks. The
single stream contains I/O requests for data with vastly
different reuse rates and access patterns.
Disk files can be classified as accesses to inodes,
directories, datafiles or executables. The combined cache
behavior of all four taken together provides few clues for
improving performance of the I/O cache. But individually, the
cache behavior of each reveals the distinct components that
make up aggregate I/O behavior. Inodes and directories turn
out to be small, highly reused files. Datafiles and
executable files have more diverse characteristics. The
smaller ones exhibit moderate reuse and have little
sequential access, while the larger files tend to be accessed
sequentially and not reused. Properly used, file type and
file size information improves cache performance.
The dissertation introduces attribute caches to improve I/O
cache performance. Attribute caches use file attributes to
selectively cache I/O data with a cache scheme tailored to
the expected behavior of the file type. Inodes and
directories are cached in very small blocks, capitalizing on
their high reuse rate, and small space requirements. Large
files are cached in large cache blocks capitalizing on their
sequential access patterns. Small and medium sized files are
cached in average 4 kbyte blocks that minimizes the memory
required to service the bulk of requests. The portion of
cache dedicated to each group varies with total cache size.
This allows the important features of the workload to be
captured at the appropriate cache size, and increases the
total cache utilization. For a set of 11 measured workloads
an attribute cache scheme reduced the miss ratio 25--60\%
depending on cache size, and required only about 1/8 as much
memory as a typical I/O cache implementation achieving the
same miss ratio.
http://i.stanford.edu/pub/cstr/reports/csl/tr/94/655/CSL-TR-94-655.pdf