Report Number: CSL-TR-95-666
Institution: Stanford University, Computer Systems Laboratory
Author: Oberman, Stuart F.
Author: Flynn, Michael J.
Date: April 1995
Abstract: Floating-point division is generally regarded as a high latency operation in typical floating-point applications. Many techniques exist for increasing division performance, often at the cost of increasing either chip area, cycle time, or both. This paper presents two methods for decreasing the latency of division. Using applications from the SPECfp92 and NAS benchmark suites, these methods are evaluated to determine their effects on overall system performance. The notion of recurring computation is presented, and it is shown how recurring division can be exploited using an additional, dedicated division cache. Additionally, for multiplication-based division algorithms, reciprocal caches can be utilized to store recurring reciprocals. Due to the similarity between the algorithms typically used to compute division and square root, the performance of square root caches is also investigated. Results show that reciprocal caches can achieve nearly a 2X reduction in effective division latency for reasonable cache sizes.