Caching is a technique by which a small amount of expensive quick memory is used to hold what is currently the most important information ( program code and data) being accessed by the CPU. This information is copied from main primary memory, accessed and manipulated, and finally written back out when no longer immediately needed. Other information is then copied from main memory and the process repeated.
Cache memory is often located on the CPU (level 1) and/or on adjacent memory chips (level 2) with high speed bus connections to the CPU.
There are several ways to organize cache memory that balance the issues of effectiveness and cost.
A particular line in each block of main memory is always associated with the same line in the cache. In other words, if a cache is 16K in size, then lines 0, 16K + 0, 32K + 0, etc of main memory will all be associated with line 0 of the cache. When a particular main memory location is accessed, its address is used to find its line identifier in main memory and the equivalent line in the cache where that memory location would be cached. If the cache line holds a copy of the line from the correct block of main memory, called a hit, the cache version of the memory is used.
If the cache line contains a line from a different block of main memory or no valid code at all, called a miss, then the correct line of main memory containing the memory of interest must be read into the cache. If the cache line did have valid memory from another block, it may have to be written out first.
An important point to remember is that while all memory locations (bytes) on a single cache line are contiguous and come from a single block of main memory, each cache line in the cache is independent of all others and any two adjacent cache lines may be from different main memory blocks.
Tag = int( Memory address / cache size )
Cache line = int(( Memory address mod(%) cache size ) / line length )
When a memory address is accessed, the calculations above are performed. The appropriate line in the cache is then checked for valid data and if valid, the tag or block id stored in the tag field is compared to the calculated tag. If the cache line contains valid data and the tags match, a hit has occurred. If the cache line is invalid or contains a line from a different block, a miss has occurred and the system must take additional steps to put the correct information in the cache.
Spatial locality occurs when a sequence of commands or information cells such as an array are accessed in order. The physical locality of the data that constitutes the commands or information makes the use of cache lines effective. Once a line of memory has been loaded to access a particular command or information cell, the next command or information cell access is likely to next to the 1st and occur in the cache.
Temporal locality occurs when sequence of commands or information cells are accessed repeatedly over a period of time. With the command sequence, this may occur as a program loop. With the information cells, this occurs when several actions are applied to the same block of data.
In most computer architectures, once loaded, code is not modified. In a simple cache, when a miss occurs, the contents of the cache will be written out even if it was code and unmodified.
In most modern computer architectures, the code and data are kept in separate places in memory. It is possible for code and data to be the same memory line in different blocks of memory. This could cause constant misses if a loop were accessing a particular block of memory.
Many architectures support multitasking. It is common for code of each task loaded to be aligned at the beginning of a memory segment which often correlates to the block size of the cache. As each task is activated, a miss occurs. The more tasks loaded the more often the miss.
Caches can be designed to avoid or limit the the above problems at a cost.
To solve the 1st two problems, the cache is often split into separate caches for data and code. This allows the same line from the code segment and the data segment to be loaded at the same time. It also allows the system to skip the write back on miss for the code portion of the cache.
In splitting the cache, the designer must decide whether to double the cache size or half the number of data and cache lines that can be cached at any time.
To solve the third problem, we take the idea of splitting the cache one step further and create additional complete sets of cache lines. Now there is a set of n cache lines available for each line in memory. A cache of this type is called a n-way set associative cache where n is the number of duplicate cache lines.
This cache requires additional circuits to search the complete set for a particular line. Additionally, the size of the expensive cache memory has increased by n. In practice, a 2 way or 4 way cache is the best trade off for improved efficiency vs cost.
With additional lines in a line set, any memory reference has has to check each copy of the line for a hit. This takes a little longer but the chances of a hit have gone up by n.
There are three ways of handling this choice.
2. LFU - Least Frequently Used. Discard the line that as been referenced the fewest times since being loaded. This based on the idea that this line probably won't be used again for a while and there is a good chance that the line being loaded will be referenced often. Requires counter hardware.
3. LRU - Least Recently Used. Discard the line that has not been accessed in the longest time. This based again on the idea of progression and that the system is probably done with that portion of code or data. This also requires additional circuitry to do bookkeeping.
The first solution, write back immediately, is easier to implement and safer, especially in systems that have some sort of parallel processing or data handling. As soon as data is changed in the cache, it is written out to main memory. However, the write back to memory takes time and partially defeats the purpose of caching.
The second solution, write delay, optimizes the speed usage of the cache. A cache line is only written out when a miss occurs. However, this is harder to design, especially if parallel data handling is implemented in the system's architecture.