Techniques to improve on simple fetch/execute cycle.
Simple pipe-lining
Multiple instructions in CPU
Different steps of the F/E cycle being processed on different instructions
at same time.
Pipe-lining can create conflicts.
Different steps wanting same resources
* fetch operand while writing results.
* attempting to conditionally branch before result finalized.
May offer 20-60% speed improvement.
Datapath description
Improved Datapath
Super-scalar
Duplication or variation of complex task circuitry, most commonly the ALU.
Arithmetic Logic Units.
Copies often not symmetrical. One ALU performs addition better and the
other favors multiplication/division.
Floating point units. May technically qualify as secondary processor.
Less competition for same resources.
Vector processing
A vector processor will apply a single instruction to multiple data units.
Units are a small set of identical processing circuits.
Customized version of super-scalar.
Useful in numeric task, not so much in word processing.
Intel supports 4 integer SIMD instructions. - SSE circa 2000
small number of useful instructions.
fetch will read 4 32-bit values from a starting place in memory (array).
single arithmetic instruction applied to all 4 values.
- instructions may not be supported on older CPUs.
- requires compiler to recognize target CPUs ability.
+ Use of SDRAM, caching, and 64-bit data bus means values can be read
in very quickly.
Hyper-threading
CPU core simulates 2 CPUs.
Not quite true parallel processing.
Takes advantage of super-scalar features.
Duplicates architectural state
Control registers : status, interrupt mask, memory management.
Uses duplicate general purpose registers.
While having a single instruction decoder/execution engine, single cache
mechanism, single MAR/MDR interfaces, etc.
Allows two separate processes or threads to co-exist in CPU core.
If one thread stalled waiting for something like I/O
other thread allowed full access to resources.
Requires an OS that is multi-CPU enabled.
Invisible to program - but programs with parallel potential may benefit.
Up to 30% performance improvement but very application dependent.
# Primarily Intel, AMD has Clustered Multi Threading.
# With the abundance of multi-core CPUs, these are somewhat out of favor.
Multi-core
Most of CPU's core circuits duplicated on same silicon chip.
2, 4, 6 core (8 now available)
Each core has its own level 1 harvard-style caches.
May share single level 2 cache circuits.
May share on or off chip level 3 cache.
Share single set of address, data,
and control lines connecting CPU to system buses.
Data lines now 64 bits (8 bytes) wide.
Requires OS be aware of multiprocessing abilities.
Multi-core requires different coding for a single application to take
advantage of multiple cores.
Super-scalar attempts to execute different parts of a single program
in parallel, whereas multi-core can run different programs.
Multiple CPUs or Multiprocessing
Separate CPUs that share bus and memory.
Symmetrical
- many identical CPUs.
Often simple.
SIMD, single instruction, multiple data (vector).
MIMD, multiple instructions, multiple data (mesh).
Modern 'cloud' computing may be a kind of MIMD.
# parallel processing - different CPUs all working on same task
(more common).
# multiprocessing - different CPUs acting on different tasks.
Asymmetrical
- different CPUs handle different system tasks.
Memory management unit, Math Co-processor
Video processor, Sound processor