Techniques to improve on simple fetch/execute cycle.

  Simple pipe-lining
    Multiple instructions in CPU

    Different steps of the F/E cycle being processed on different instructions
      at same time.

    Pipe-lining can create conflicts.
      Different steps wanting same resources
       * fetch operand while writing results.
       * attempting to conditionally branch before result finalized.

    May offer 20-60% speed improvement.

    Datapath description
    Improved Datapath

  Super-scalar
    Duplication or variation of complex task circuitry, most commonly the ALU. 

    Arithmetic Logic Units.
      Copies often not symmetrical. One ALU performs addition better and the 
        other favors multiplication/division.

    Floating point units.

    Less competition for same resources.

  Vector processing
    A vector processor will apply a single instruction to multiple data units.
      Units are a small set of indentical proccessing circuits.
      Customized version of super-scalar.

    Useful in numeric task, not so much in word processing.

    Intel supports 4 integer SIMD instructions.  
      small number of useful instructions. 
      fetch will read 4 32-bit values from a starting place in memory (array).
      single arithmetic instruction applied to all 4 values. 
    - instructions may not be supported on older CPUs.
    - requires compiler to recognize target CPUs ability.

  Hyper-threading
    CPU core simulates 2 CPUs. 
    Not quite true parallel processing.

    Takes advantage of super-scalar features.
    Duplicates architectural state 
    Control registers : status, interrupt mask, memory management.
    Uses duplicate general purpose registers.
    Allows two separate processes or threads to co-exist in CPU core.
    If one thread stalled waiting for something like I/O
      other thread allowed full access to resources.

    Requires an OS that is multi-CPU enabled.
    Invisible to program.

    Up to 30% performance improvement but very application dependent.  

  Multi-core
https://www.howtogeek.com/194756/cpu-basics-multiple-cpus-cores-and-hyper-threading-explained/

    Most of CPU's core circuits duplicated on same silicon chip.
      2, 4, 6 core (8 now available)

    Each core has its own level 1 caches.

    May share single level 2 cache circuits.

    Share single set of address, data,
      and control lines connecting CPU to system buses.
      Data lines now 64 bits (8 bytes) wide.

    Multi-core requires different coding for a single application to take
      advantage of multiple cores.

      Super-scalar attempts to execute different parts of a single program
      in parallel, whereas multi-core can run different programs.

  Multiple CPUs or Multiprocessing
    Separate CPUs that share bus and memory.

    Symmetrical
      - many identical CPUs.

      Often simple.  
        SIMD, single instruction, multiple data (vector).
        MIMD, multiple instructions, multiple data (mesh).

      # parallel processing - different CPUs all working on same task 
         (more common).

      # multiprocessing - different CPUs acting on different tasks.
         
    Asymmetrical
      - different CPUs handle different system tasks.

      Memory management unit, Math Co-processor

      Video processor, Sound processor