Report Number: CSL-TR-95-666
Institution: Stanford University, Computer Systems Laboratory
Title: ON DIVISION AND RECIPROCAL CACHES
Author: Oberman, Stuart F.
Author: Flynn, Michael J.
Date: April 1995
Abstract: Floating-point division is generally regarded as a high
latency operation in typical floating-point applications.
Many techniques exist for increasing division performance,
often at the cost of increasing either chip area, cycle time,
or both. This paper presents two methods for decreasing the
latency of division. Using applications from the SPECfp92 and
NAS benchmark suites, these methods are evaluated to
determine their effects on overall system performance. The
notion of recurring computation is presented, and it is shown
how recurring division can be exploited using an additional,
dedicated division cache. Additionally, for
multiplication-based division algorithms, reciprocal caches
can be utilized to store recurring reciprocals. Due to the
similarity between the algorithms typically used to compute
division and square root, the performance of square root
caches is also investigated. Results show that reciprocal
caches can achieve nearly a 2X reduction in effective
division latency for reasonable cache sizes.
http://i.stanford.edu/pub/cstr/reports/csl/tr/95/666/CSL-TR-95-666.pdf