Caching

2001-03-15

Caching is a technique by which a small amount of expensive quick memory is used to hold what is currently the most important information ( program code and data) being accessed by the CPU. This information is copied from main primary memory, accessed and manipulated, and finally written back out when no longer immediately needed. Other information is then copied from main memory and the process repeated.

Cache memory is often located on the CPU (level 1) and/or on adjacent memory chips (level 2) with high speed bus connections to the CPU.

There are several ways to organize cache memory that balance the issues of effectiveness and cost.

A single mixed code and data cache vs separate code and data caches
Direct caching vs set associate caching
2 way vs 4 way vs n-way set associative caching

Cache function

The general function of a cache is accomplished by creating an association between the main primary memory and the cache memory. First, main memory is divided into blocks equal to the total size of the cache. Then both main memory and the cache itself are divided into smaller units called lines.

A particular line in each block of main memory is always associated with the same line in the cache. In other words, if a cache is 16K in size, then lines 0, 16K + 0, 32K + 0, etc of main memory will all be associated with line 0 of the cache. When a particular main memory location is accessed, its address is used to find its line identifier in main memory and the equivalent line in the cache where that memory location would be cached. If the cache line holds a copy of the line from the correct block of main memory, called a hit, the cache version of the memory is used.

If the cache line contains a line from a different block of main memory or no valid code at all, called a miss, then the correct line of main memory containing the memory of interest must be read into the cache. If the cache line did have valid memory from another block, it may have to be written out first.

Cache design

The simplest cache design to accomplish the function described above is a table with three fields.

The first field contains a flag indicating whether the line of data is valid. This is an issue when the computer system first comes up. The first access to any memory will automatically be a cache miss. However, the information in the cache is garbage and should not be first written out to main memory that may contain valid data.
The second field contains the block address, called the tag, of the main memory where the line of data in the cache came from (or is going to).
The third field contains the line of memory from the main memory. The length of this line is usually between 4 and 16 bytes or cells. The number of bytes is usually equal to the data bus size or some natural multiple of. This allows efficient moving of data between the cache line and the main memory.

The number of rows (lines) in the cache is determined by the size of the cache divided by the number of bytes in a cache line. So a 64K cache with 16 byte lines contains 4K lines.

An important point to remember is that while all memory locations (bytes) on a single cache line are contiguous and come from a single block of main memory, each cache line in the cache is independent of all others and any two adjacent cache lines may be from different main memory blocks.

Determining cache line

To calculate the cache information from a memory address, use the following formulas.

Tag = int( Memory address / cache size )

Cache line = int(( Memory address mod(%) cache size ) / line length )

When a memory address is accessed, the calculations above are performed. The appropriate line in the cache is then checked for valid data and if valid, the tag or block id stored in the tag field is compared to the calculated tag. If the cache line contains valid data and the tags match, a hit has occurred. If the cache line is invalid or contains a line from a different block, a miss has occurred and the system must take additional steps to put the correct information in the cache.

Locality

Correcting a cache miss takes longer than accessing memory with caching. What makes caching effective is the principle of locality. Two types of locality existing in computing, spatial and temporal.

Spatial locality occurs when a sequence of commands or information cells such as an array are accessed in order. The physical locality of the data that constitutes the commands or information makes the use of cache lines effective. Once a line of memory has been loaded to access a particular command or information cell, the next command or information cell access is likely to next to the 1st and occur in the cache.

Temporal locality occurs when sequence of commands or information cells are accessed repeatedly over a period of time. With the command sequence, this may occur as a program loop. With the information cells, this occurs when several actions are applied to the same block of data.

Cache efficiency

The basic cache design described above while easy to design and implement, is fairly inefficient. The following lists some problems that contribute to this inefficiency.

In most computer architectures, once loaded, code is not modified. In a simple cache, when a miss occurs, the contents of the cache will be written out even if it was code and unmodified.

In most modern computer architectures, the code and data are kept in separate places in memory. It is possible for code and data to be the same memory line in different blocks of memory. This could cause constant misses if a loop were accessing a particular block of memory.

Many architectures support multitasking. It is common for code of each task loaded to be aligned at the beginning of a memory segment which often correlates to the block size of the cache. As each task is activated, a miss occurs. The more tasks loaded the more often the miss.

Caches can be designed to avoid or limit the the above problems at a cost.

To solve the 1st two problems, the cache is often split into separate caches for data and code. This allows the same line from the code segment and the data segment to be loaded at the same time. It also allows the system to skip the write back on miss for the code portion of the cache.

In splitting the cache, the designer must decide whether to double the cache size or half the number of data and cache lines that can be cached at any time.

To solve the third problem, we take the idea of splitting the cache one step further and create additional complete sets of cache lines. Now there is a set of n cache lines available for each line in memory. A cache of this type is called a n-way set associative cache where n is the number of duplicate cache lines.

This cache requires additional circuits to search the complete set for a particular line. Additionally, the size of the expensive cache memory has increased by n. In practice, a 2 way or 4 way cache is the best trade off for improved efficiency vs cost.

With additional lines in a line set, any memory reference has has to check each copy of the line for a hit. This takes a little longer but the chances of a hit have gone up by n.

Selecting a particular line from a set

However an additional problem occurs with an n-way cache. If a miss happens, which of the cached lines is to be discarded?

There are three ways of handling this choice.

2. LFU - Least Frequently Used. Discard the line that as been referenced the fewest times since being loaded. This based on the idea that this line probably won't be used again for a while and there is a good chance that the line being loaded will be referenced often. Requires counter hardware.

3. LRU - Least Recently Used. Discard the line that has not been accessed in the longest time. This based again on the idea of progression and that the system is probably done with that portion of code or data. This also requires additional circuitry to do bookkeeping.

Each of the above systems have advantages and disadvantages.

When to write changed cache

Another issue is when to write back a modified cache line. This was partially handled by splitting the cache between data and code. However, data tends to constantly. So, there are two options. Write back the data as soon as it changes or write back the data only when a miss occurs and the contents of cache line need to change.

The first solution, write back immediately, is easier to implement and safer, especially in systems that have some sort of parallel processing or data handling. As soon as data is changed in the cache, it is written out to main memory. However, the write back to memory takes time and partially defeats the purpose of caching.

The second solution, write delay, optimizes the speed usage of the cache. A cache line is only written out when a miss occurs. However, this is harder to design, especially if parallel data handling is implemented in the system's architecture.