Report Number: CSL-TR-89-383
Institution: Stanford University, Computer Systems Laboratory
Title: Super-Scalar Processor Design
Author: Johnson, William M.
Date: June 1989
Abstract: A super-scalar processor is one that is capable
of sustaining an instruction-execution rate of
more than one instruction per clock cycle. Maintaining
this execution rate is primarily a problem of
scheduling processor resources (such as functional
units) for high utilization. A number of
scheduling algorithms have been published, with wide-ranging
claims of performance over the single-instruction
issue of a scalar processor. However, a number of
these claims are based on idealizations or on
special-purpose applications.
This study uses trace-driven simulation to evaluate
many different super-scalar hardware organizations.
Super-scalar performance is limited primarily by
instruction-fetch inefficiencies caused by both
branch delays and instruction misalignment. Because of
this instruction-fetch limitation, it is not worthwhile to
explore highly-concurrent execution hardware. Rather, it
is more appropriate to explore economical execution
hardware that more closely matches the instruction
throughput provided by the instruction fetcher. This
study examines techniques for reducing the instruction-fetch
inefficiencies and explores the resulting hardware
organizations. This study concludes that a super-scalar
processor can have nearly twice the performance of a
scalar processor, but that this requires that four major
hardware features: out-of-order execution, register
renaming, branch prediction, and a four-instruction
decoder. These features are interdependent, and
removing any single feature reduces average performance
by 18% or more. However, there are many hardware
simplifications that cause only a small performance
reduction.
http://i.stanford.edu/pub/cstr/reports/csl/tr/89/383/CSL-TR-89-383.pdf