Lectures Next
Pipe-lining

Clock speeds and MIPs

CPU uses a clock to switch between stable and transition states.

Depending on complexity of instruction
  Number of steps and clock cycles to finish instruction can vary greatly.

Semi-Simplified fetch/execute steps for 8-bit system bus supporting
multi-byte instructions (CISC). Total time (clock cycles) can vary widely.

  Fetch instruction op-code (fi)
  
  Decode instruction op-code (di)
  
  Fetch Data (fd)

  Execute instruction (ei)

  Write-back - write result into user accessible register.(wb)


Non-pipe-lined - 1 MHz clock

FIDIFDEIWB
T1I1



T2
I1


T3

I1

T4


I1
T5



I1
Instruction type 1
1 clock per step in FE cycle * 5 steps = 5 clocks/ins.
1,000,000 cycles per sec./5 cycles per instruction = 200,000 ins/sec.

Instruction type 2
FI = 1 clock, DI = 1 clock, FD = 2 clock, EI = 4 clock, WB = 2 clock = 10 clocks/ins.
1,000,000 cycles per sec./10 cycles per instruction = 100,000 ins/sec.

Instruction type 3
FI = 1 clock, DI = 1 clock, FD = 5 clock, EI = 1 clock, WB = 0 clock = 8 clocks/ins.
125,000 ins/sec

Pipe-lined - 1 MHz clock - example assumes single instruction type for simplicity.


FIDIFDEIWB
T1I1



T2I2I1


T3I3I2I1

T4I4I3I2I1
T5I5I4I3I2I1
Instruction type 1
Since all steps are the same, longest step 1 cycle/sec.
1 instruction can be completed every clock cycle with full pipeline.
1,000,000 cycles per sec. / 1 cycle per instruction = 1,000,000 ins/sec.

Instruction type 2
Longest step = 4 clocks for EI step
1 instruction can be completed every 4 clock cycles with full pipeline.
1,000,000 cycles per sec. / 4 cycles per instruction = 250,000 ins/sec.

Instruction type 3
Longest step = 5 clocks for FD step
1 instruction can be completed every 5 clock cycles with full pipeline.
1,000,000 cycles per sec. / 5 cycles per instruction = 200,000 ins/sec.

Super-scalar pipeline (execution step) - 1 MHz clock - single instruction type for simplicity.

FIDIFDEIWB

T1
I1

--
T2I2I1
--
T3I3I2I1--
T4I4I3I2I1
--

T5I5I4I3I1
--
I2

T6I6I5I4I3
--
I2
I1
T7I7I6I5I3
--
I4
I2
Instruction type 1
Since all steps are the same, super-scalar has no overall effect.
1 instruction can be completed every clock cycle with full pipeline.
1,000,000 cycles per sec. / 1 cycle per instruction = 1,000,000 ins/sec.

Instruction type 2
Longest step = 4 clocks for EI step
But this is averaged between two execution steps :
2 instruction can be completed every 4 clock cycles with full pipeline
or 1 instruction every 2 clock cycles (averaged)
1,000,000 CPUs. / ( 4 cycles per instruction / 2 instructions at a time ). = 500,000 ins/sec.

Instruction type 3
Longest step = 5 clocks for FD step
1,000,000 CPUs. / 5 cycles per instruction. = 200,000 ins/sec.
Because the operand fetch unit is NOT paralleled, it will still
require 5 clocks to complete an instruction
    super-scalar not applied to the operand fetch stage
    so has no effect.

The slowest (longest) step in the sequence limits the throughput. This bottleneck may appear in different parts of the pipeline depending on the instruction.

In perfect world, each step of the fetch/execute cycle works with no competition with other steps for common resources.

In reality, execution time of individual instructions often slower because of resource competition and the overhead of actual staging.

But overall performance (average execution time) improves greatly.