Pipe-lining Clock speeds and MIPs CPU uses a clock to switch between stable and transition states. Depending on complexity of instruction Number of steps and clock cycles to finish instruction can vary greatly. Semi-Simplified fetch/execute steps for 8-bit system bus supporting multi-byte instructions (CISC). Total time (clock cycles) can vary widely. Fetch instruction op-code (fi) Decode instruction op-code (di) Fetch Data (fd) Execute instruction (ei) Write-back - write result into user accessible register.(wb) Non-pipe-lined - 1 MHz clock
|
Instruction type 1 1 clock per step in FE cycle * 5 steps = 5 clocks/ins. 1,000,000 cycles per sec./5 cycles per instruction = 200,000 ins/sec.
Instruction type 2
Instruction type 3
|
Pipe-lined - 1 MHz clock - example assumes single instruction type for simplicity.
|
Instruction type 1 Since all steps are the same, longest step 1 cycle/sec. 1 instruction can be completed every clock cycle with full pipeline. 1,000,000 cycles per sec. / 1 cycle per instruction = 1,000,000 ins/sec.
Instruction type 2
Instruction type 3 |
Super-scalar pipeline (execution step) - 1 MHz clock - single instruction type for simplicity.
|
Instruction type 1 Since all steps are the same, super-scalar has no overall effect. 1 instruction can be completed every clock cycle with full pipeline. 1,000,000 cycles per sec. / 1 cycle per instruction = 1,000,000 ins/sec.
Instruction type 2
Instruction type 3 |
The slowest (longest) step in the sequence limits the throughput. This bottleneck may appear in different parts of the pipeline depending on the instruction.
In perfect world, each step of the fetch/execute cycle works with no competition with other steps for common resources.
In reality, execution time of individual instructions often slower because of resource competition and the overhead of actual staging.
But overall performance (average execution time) improves greatly.