Data Types

One determining factor in the design of a Instruction Set Architecture is the data types it will handle. All cpus are required to recognize a few of the more important data types, such as address, boolean, and unsigned integer data types. Many cpus recognize additional data types and include instructions specifically designed to manipulate these.

Addresses represent possible memory addresses available of the system. On some systems, a complete address cannot be represented by the data's unit size and must be established using combinations of registers or fetches from memory. Some systems also allow the use of a signed relative offset added to the current Instruction pointer to determine the address

Boolean values represent flags or status states. They are usually examined or manipulated at the bit level. Since the normal unit size of a computer system is the byte or an interger scaler of the byte (word or double word), boolean manipulation is performed either by instructions dedicated to specific bits, such as setting or checking the arithmetic carry bit, or by using a mask as one of the instruction's operands.

Numbers represent numeric values to be mathematically manipulated. This includes add, subtract, binary multiply and divde. Some systems may also include instructions that support multi-byte multiply/divde, floating point math and trig. or logrithmic functions. Other sytemss require the designers to create software libraries based on the simpler functions to accomplish this.

Numeric representation can be grouped into several catagories.

Integers

Floating point numbers (real) are signed approximate representations. The magnitude and accuracy is determined by the number of bytes dedicated to the representation. It is much more difficult to scale hardware supported floating point instructions. If a float is desired with a magnitude and accuracy different than that defined for the instruction, it is usually done completely with software.

Binary Coded Decimals (BCD) are integers with a range of 0-9 rather than the standard 16 value nibble (4 bit) or 256 value byte. Extra bits in the nibble are ignored. Although wasteful of storage space, BCD was used early in the development of computers because it allowed the translation of algorithms used by people to solve finacial (and other math problems) into computer programs with a minimum of effort.

Characters are datum that represent human readable text. In most cases, the instruction set architecture does not attempt to interpret character data. However, because characters tend be grouped into long sequences of meaningless (to the computer) arrangements, any manipulation of the data must honor the default order even if it is resource intensive to the cpu. Some cpus provide commands that implicitly include loop and indexing functions to allow easy moving, searching, or comparing of user defined lengths of character sequences.

Current popular character data types are ASCII, EBCDIC, and Unicode. The first two are 1 byte representations of English specific characters. The third is a 2 byte code that has defined characters sets from many languages.

ASCII

EBCDIC code is the English character set defined by IBM for its mainframe computers. It also includes upper and lower case alpha, numeric, and control characters.

Unicode is set so that the high byte specifies the language or specific character set being used and the lower byte specifies the characters in that set. Commands designed to not disturb the order of ASCII or EBCDIC data sequences will also work with Unicode. However, if a particular cpu has designed to perform some special manipulation of a particular character set will improperly handle one of these other character types.

Data ordering and allignment

An important topic related to data types is the issue of data ordering and allignment.

Data ordering issues result from the difference between the size of a specific data unit and the size of the systems data bus or registers. If an integer is defined as a 16 bit value but the data bus and/or ALU register is 8 bits wide, then the integer must be fetched in two steps. However, the question becomes, is the 1st byte fetched, the high byte of the value or the low byte? And is the data arranged so that when address X is accessed to fetch the data, is the low byte at X and the high byte at X+1 or is it arranged the opposite?

On a system that requires multiple fetches to read a unit of data, placing the low byte at the lower address is often a more convenient way of manipulating data. This arrangement is called little Endian (the little end is at the lower address).

On systems that read the whole data unit or have a wide data bus (32 bits), placing the high byte at the low or starting address and simply placing the subsequent bytes sequentially up the address range offer an arrangement that is simple and quick to access. This arrangement is called big Endian because the high (big) value is at the starting address.

On systems that have been improved this often becomes a problem. A system may have started as with a narrow bus and used the Little Endian protocols. But over time the bus has been widened to offer improved performance. However, if a large number of applications have been written for the original system, the manufacturer may be obligated to maintain the original protocols even though switching to Big Endian may provide additional performance.