Techniques to improve on simple fetch/execute cycle. Simple pipe-lining Multiple instructions in CPU Different steps of the F/E cycle being processed on different instructions at same time. Pipe-lining can create conflicts. Different steps wanting same resources * fetch operand while writing results. * attempting to conditionally branch before result finalized. May offer 20-60% speed improvement. Datapath description Improved Datapath Super-scalar Duplication or variation of complex task circuitry, most commonly the ALU. Arithmetic Logic Units. Copies often not symmetrical. One ALU performs addition better and the other favors multiplication/division. Floating point units. May technically qualify as secondary processor. Less competition for same resources. Vector processing A vector processor will apply a single instruction to multiple data units. Units are a small set of identical processing circuits. Customized version of super-scalar. Useful in numeric task, not so much in word processing. Intel supports 4 integer SIMD instructions. - SSE circa 2000 small number of useful instructions. fetch will read 4 32-bit values from a starting place in memory (array). single arithmetic instruction applied to all 4 values. - instructions may not be supported on older CPUs. - requires compiler to recognize target CPUs ability. + Use of SDRAM, caching, and 64-bit data bus means values can be read in very quickly. Hyper-threading CPU core simulates 2 CPUs. Not quite true parallel processing. Takes advantage of super-scalar features. Duplicates architectural state Control registers : status, interrupt mask, memory management. Uses duplicate general purpose registers. While having a single instruction decoder/execution engine, single cache mechanism, single MAR/MDR interfaces, etc. Allows two separate processes or threads to co-exist in CPU core. If one thread stalled waiting for something like I/O other thread allowed full access to resources. Requires an OS that is multi-CPU enabled. Invisible to program - but programs with parallel potential may benefit. Up to 30% performance improvement but very application dependent. # Primarily Intel, AMD has Clustered Multi Threading. # With the abundance of multi-core CPUs, these are somewhat out of favor. Multi-core Most of CPU's core circuits duplicated on same silicon chip. 2, 4, 6 core (8 now available) Each core has its own level 1 harvard-style caches. May share single level 2 cache circuits. May share on or off chip level 3 cache. Share single set of address, data, and control lines connecting CPU to system buses. Data lines now 64 bits (8 bytes) wide. Requires OS be aware of multiprocessing abilities. Multi-core requires different coding for a single application to take advantage of multiple cores. Super-scalar attempts to execute different parts of a single program in parallel, whereas multi-core can run different programs. Multiple CPUs or Multiprocessing Separate CPUs that share bus and memory. Symmetrical - many identical CPUs. Often simple. SIMD, single instruction, multiple data (vector). MIMD, multiple instructions, multiple data (mesh). Modern 'cloud' computing may be a kind of MIMD. # parallel processing - different CPUs all working on same task (more common). # multiprocessing - different CPUs acting on different tasks. Asymmetrical - different CPUs handle different system tasks. Memory management unit, Math Co-processor Video processor, Sound processor