Problem | Type | Fetch opcode |
Decode | Fetch operands |
Execute | Write back |
3. | A | 1 clock | 1 clocks | 1 clocks | 1 clocks | 1 clock |
A. 1 clock + 1 clock + 1 clock + 1 clock + 1 clock = 5 clocks/ins. B. 1GHz = 1* 10^9 cycles/sec. / 5 clocks/ins. = 2 * 10^8 ins./sec. C. Because each step is 1 cycle, once pipeline is saturated, it will complete an instruction each clock cycle. D. 1 * 10^9 cycles/sec. / 1 clock/ins. = 1 * 10^9 ins./sec. E. Even though on average the Exectute stage can complete 2 executes a clock cycle, all other stages take 1 cycle, so the maximum speed is 1 cycle/ins. or 1 * 10^9 ins./sec.
Problem | Type | Fetch opcode |
Decode | Fetch operands |
Execute | Write back | 4. | B | 1 clock | 1 clocks | 4 clocks | 10 clocks | 4 clock |
A. 1 clock + 1 clock + 4 clock + 10 clock + 4 clock = 20 clocks/ins. B. 1GHz = 1 * 10^9 cycles/sec. / 20 clocks/ins. = 10 * 10^8 cycles/sec. / 2 * 10^1 clocks/ins. = 5 * 10^7 ins./sec. C. Execute step has longest period of 10 cycles which sets the completion limit. D. 10 * 10^8 cycles/sec. / 10 clock/ins. = 1 * 10^8 ins./sec. E. Average completion time of two Execute stages 10 cycles/ins / 2 = 5 cycles/ins. This is still longest stage, 10 x 10^8 cycles/sec / 5 cycles/ins = 2 x 10^8 ins./sec.
Problem | Type | Fetch opcode |
Decode | Fetch operands |
Execute | Write back |
5. | C | 1 clock | 1 clock | 4 clocks | 4 clock | 0 clock |
A. 1 clock + 1 clock + 4 clock + 4 clock + 0 clock = 10 clocks/ins. B. 1GHz = 10 * 10^8 cycles/sec. / 10 clocks/ins. = 1 * 10^8 ins./sec. C. Both/either Fetch operand and Execute take 4 cycles once pipeline is saturated, it will complete an instruction every 4 cycles. Remember : 1 * 10^9 = 10 * 10^8 = 100 * 10^7 D. 100 * 10^7 cycles/sec. / 4 clock/ins. = 25 * 10^7 ins./sec. E. Even though on average the Exectute stage can complete 2 executes a clock cycle, this does not affect the Fetch operand stage which still requires 4 cycles. As a result, the output is the same as a CPU without parallel execute stage (superscalar).