Back
Lectures
Software tweaks :
Compile time:
Branch delay slot
Insert NOP (no operations) to provide some additional time for
previous instruction to complete.
Out or order execution. Inserting an unrelated instruction after
branch test. Similar to NOP but gets something done.
Compiler-optimization
Compiler generates out of order instruction sequence when a set of
instructions in a particular sequence are NOT dependent on each other,
the compiler can reorder.
Initial | re-ordered |
Add A to B
Store B
Set C to 7
Set D to 0
|
Add A to B
Set C to 7
Set D to 0
Store B
|
- Out of order compiling time expensive,
so usually reserved for final compile of program.
Branch Prediction
Rewrite test to favor linear (sequential) branch.
Code recompiled so more often executed code immediately follows
branch test (branch not taken).
May require additional information or compiler simulation.
'Static' - compiler assumes branches back to lower address or
previous inline code more likely taken.
Substitute instructions with more efficient instructions.
e.g. Multiply by 2 replaced with shift-left.
- Optimization takes time. If code is in development stage, may not
be tolerable.
- Because branching often conditional on data,
compiler best guess may or may not be.
Run-time :
Branch Predictor
As more executions occur, note the preferred behaviour of branch.
Requires a counter of 1 or more bits.
1-bit notes last evaluation of branch.
2-bit notes last 4 evaluations.
More bits, better observation but more complex logic.
Two level predictor
Counter replaced with or kept in a small table that records the history
of branch taken/not taken.
Table allows predictor circuit to see patterns such as branch taken
every third time.
* One of the most popular branch handing techniques in current use.
Example :
2-bit by 4 entry table.
First time branch encountered, 1st entry adjusted.
2nd time, 2nd entry adjusted.
3nd time, 3nd entry adjusted.
4nd time, 4nd entry adjusted.
We now can recognize patterns in the branch behaviour.
Longer the table, the better the prediction.
Additonal field required to handle different/nested branches
in program.
Branch history table starts to perform as a cache with its
associated overhead and logic.
Branch target prediction table.
Branch addresses are often relative to PC
or may be specified in a pointer (indirect addressing).
Precalculating the target address and storing it in a small cache table
can improve performance.
Branch predicition table - if branch is taken.
Branch target predicition table - where to branch to.
Both of these are often implemented in the same logic module.
Predict which branch more likely taken based on direction of branch.
Branches back (PC - X) are more commonly at the bottom of a loop and
more likely to be taken.
Branches forward (PC + X) are more commonly breaking out of a loop and
less likely to be taken.
Use conditional instruction (predicate)
Useful for short branch conditions.
sub r3,r2,r1 # subtract r1 from r2 r3=r2-r1
brp debit # branch to code marked by debit if positive
xor r3,r3 # other wise zero out r3
debit sub r2,r2,1 # subract 1 from r2
sub r3,r2,r1 # subtract r1 from r2 r3=r2-r1
xorn r3,r3 # zero out r3 if negative flag set
debit sub r2,r2,1 # subract 1 from r2
Same test condition can be checked by multiple instructions
as long as they don't modify condition.
Works well when 'branch' is fairly short.
- requires additional bits in instruction opcode. bigger code.
- requires additional logic.
- may require additional execution time
extra clock cycle or slightly slower clock.
# Intel uses movcc and fcmov to conditionally move data in a
# registers based on 8 (3-bit) recognized conditons.
#
# ARM uses a 4-bit conditional test (16 possible condtions) on
# a number its opcodes.
Out of order execution
Cache next instruction if data dependent.
Out of order execution - requires advanced hardware logic.
More common in CISC CPUs - Intel P6.
Speculative execution
"Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing the work after it is known that it is needed. If it turns out the work was not needed after all, most changes made by the work are reverted and the results are ignored.
From the wikipedia page.
The branch prediction covered above is a form of "predicitve execution"
If 'prediction' wrong, pipeline has to be flushed and reinitalized
with code from correct branch.
Eager execution assumes availability of circuitry to be able to run
both branches as separate sub-tasks and discard results of branch
not taken without affecting the overall state of the pipeline.
Super-scalar ALUs may allow for some cases of eager execution.
IBM 360 supported.
Systems using CUDA - library that allows a CPU to offload computation
on to NVidia graphics co-processors.
Requires compile time library support.
Mainly used to process large arrays of data in parallel (?).
Resource needs grow exponentially as number of unresolved branches
increases.
Spectre
Meltdown (Intel CPUs)
On most CPUs built in the last 20 yrs. using branch prediction,
an abandoned branch is not purged after abandonment.
Unlike main memory where memory for one program is protected from access
by another program, the branch prediction area inside the CPU
is not protected.
A clever program can mine information from this area about what is in
the CPU cache or even in the CPU itself.
Lectures