Back Lectures
General problems with pipe-lining

Hazards

Competition for same circuits  (Structural hazard)
  such as Memory address register or data buffer register
    Writing data out while attempting to fetch next instruction.

    On a Harvard architecture system, both data transfers would be 
      possible.


Data Dependency  (Data hazard)

  Read after write. (raw)
    Multiply A by B and write back to A
    Copy A to C

    Multiply must complete before copy to be implemented correctly.

  Write after Read (war)
    R4 = R1 * R5 
    R5 = R1 + R2
    # in systems with super-scalar (more than one ALU) or concurrent execution,

  Write after write
    R2 <- R4 * R7
    R2 <- R1

    # Suppose R2 is a memory mapped register interface to USB controller, 
    # any writes to R2 are sent to the controller which writes to a device
    # in consecutive sequence, so ordering is is important. 


Branching (Branch or control hazard)

  Compare A to B
  Branch to "here" if less than x
  Set A to zero
  ...
  here: Increment A

  Compare may not be complete before encountering conditional branch.
    Wrong branch of code applied to data.

    Pipeline full of incorrect leg of branch.     



Possible Solutions : 
  Hardware tweaks primarily for structural hazard.:
    Increase bus width - read more code or data in single memory access.
    + increases free time of MAR and MDR for other stages of instruction.
    - ups cost of CPU and system bus.   

    Cache
    + faster reads/writes - get it over quicker.
    + separate cache and internal buses for program and data. No contention.
      (Modified Harvard arch.)
    - cost and design complexity.

    Increase number of stages of an instruction.
    + less competition for a particular stage.
    + different sub-tasks of different instructions performed at same time.
    - more complex logic (cost and speed).
    - too many stages can actually slow CPU down.
      # Pentium Prescot 31 stages ~2004
      # Pentium Skylake 14 stages ~2015

    Provide scratch registers and buffers.
    + similar to cache - acts as buffer between stages.
    - cost and some delay between stages 
      but may make preceeding stage available sooner.
    + buffers also allow fetch circuitry and execute circuitry to
      function independent (decoupled) from each other.
      Useful for out of order execution. 
    - branch misprediction requires flushing of pipeline and buffers.
      Also, create security issues because buffers don't have same 
        memory protection as external memory. 
          Check out Spectre and Meltdown.
         

    Out of order execution and Re-order buffer
      If operands needed and access to them delayed, 
        set aside opcode in side queue while attempting to complete fetch. 
        Process subsequent instructions in pipeline.

      When an instruction's operands are available,
        Run instruction.

      + faster execution by better use of resources.
      - cost,  support circuitry more complex.
      - needs to guard against data dependency conflicts.
        May require help of advanced software compiler to arrange instructions.

    Create copies of work registers,  Register files - type of cache.
      During task switching, point CPU to duplicate work registers 
        rather than writing registers to some form of external memory.
       + fast task swapping.
       - cost and complexity to CPU.
       + on some CPU architectures, register files can be configured to
         provide addtional registers. 
           16 sets of 16 registers
            8 sets of 32 registers
            1 set of 256 registers.

Software tweaks