Fall 2018


Bring scratch paper. Series of multiple choice and true false questions on the various character sets covered in lecture and the linked reading. Conversion of signed integer, conversion to binary, conversion of binary value to signed integer. The size of storage will be given but will not be 8-bit. So, you need to know the algorithms. Conversion of real float to stored binary float modeled after IEEE standard, but actual exponent and significand sizes varie from standard. Know the conversion algorithms. Note for this fractional portion may be multiple digits long. The conversion method is simple. Conversion of stored binary float modeled after IEEE standard back to an actual float value given a specific signifand and exponent. For this I will limit the fractional portion of the answer to 3 digits to the right max. The significand and exponent sizes for the conversion are 7, 8, or 9 bits long but mixed. Calculate the maximum, minimum, and denormalized minimum a float of a given exponent and significand size can represent. For all of these, you will be given the value and the size of storage and a list of alternative solutions to pick from. Work out your answers on scratch paper and select correct answer. Know how zero, overflow, normal, denormalized, and NaN are flagged in a float storage.
Integers Given a 12 bit integer storage size, convert the value 673 to its binary, hex, and octal values. Use divide by 2. 673   336  1 168  0 84  0 42  0 21  0 10  1 5  0 2  1 1  0 0  1 = 10 1010 0001 b 1 010 100 001 b 2 A 1 h 1 2 4 1 o Give its 2s complement values. (This example is 12 bit storage. 0010 1010 0001 b Make sure you left pad with zeros to full width of storage 1101 0101 1110b + 1 1101 0101 1111 b D5Fh Convert both the original and 2's complement binary values back to an unsigned integer. 0 0 1 0 1 0 1 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 1 2 3 5 6 10 13 21 26 42 53 84 106 168 213 336 427 673 855 1711 3423 Check by subtraction : 2^12 = 4096 4096 - 673 = 3423 Floats Give the general formula for calculating the bias value for a floating point value. ^(n-1) 2 -1 Given a floating point number storage with an 12 bit mantissa and a 10 bit exponent storage size and an actual real value 912.730 : Convert a real number to its floating point storage. convert integer portion convert fractional portion (truncate, round to zero) integer decimal * Nine bits used for integer, so only 3 bits 912 . .730 * actuall needed for fraction. 456 0 1 .460 228 0 0 .920 114 0 1 .840 57 0 1 .680 28 1 1 .360 14 0 0 .720 7 0 1 .440 3 1 0 .880 1 1 1 .760 0 1 1 .520 I -> 11 1001 0000 MSB is virtual so we use the lower 9 1 1001 0000 F -> .1011101011 We only need 3 from the fractional side. 1110010000.101 b (no rounding, MSB not hidden yet) normalize 1.110010000101 b * 2^9 (remember, we don't store bit to the left of . ) calculate bias. 2^(10-1)-1 = 511 bias 511+9 = 520 biased exponent 520 1 0 0 0 0 0 1 0 0 0 260 0 1 2 130 0 2 4 65 0 4 8 32 1 8 16 16 0 16 32 8 0 32 64 4 0 65 130 2 0 130 260 1 0 260 520 0 1 - 10 0000 1000b 520 0 1000001000 110010000101 (truncated, not rounded) ^ exponent significand s i g n Convert the actual stored value back to a real number. Exponent 520 - 511 = 9. Stored mantissa 110010000101 -> with virtual 1 1.110010000101 Converted from normalized 1110010000.101 1 1 1 0 0 1 0 0 0 0 1 2 3 6 7 14 14 28 28 56 57 114 114 228 228 456 456 912 .101 . 1 * .5 = .5 0 * .25 = .0 1 * .125 = .125 .5 + .125 = 625 Original 912.730 Stored 912.625 MinMax 7 bit exponent 8 bit mantissa Max : 0 111 1110 1111 1111 Bias 63 Biased exponent 1 1 1 1 1 1 0 1 3 7 15 31 63 126 Unbiased exponent 126 - 63 = 63 1.11111111b * 2^63 (10b - 0.00000001b) * 2^63 = (2^1 - 2^-8) * 2^63 = 2^64 - 2^55 Denormalized min: 0 000 0000 0000 0001 Remember, although the exponent is set to zero, it is functioning as a flag and should be treated as if it holds a 1. And the virtual 1 found at the beginning of all normalized floats does not exist it situation. Unbiased exponent 1 - 63(bias) = -62 0.00000001 * 2^-62 = 1 * 2^(-62 -8) = 2^-70 Generic format of single percission floats. b = either zero or 1 S i g n 8 bit exponent 23 bit significand b bbbbbbbb bbb - bbbbb 23 bits normalized Exponent can be any value other all 0s or all 1s. b 00000000 all zeros Zero b 00000000 any value other than all zeros denormalized. b 11111111 all 1s infinity b 11111111 ******************* Review the general features of the various character sets enough to answer true/false questions. Basic storage and Character lectures ASCII Wikipedia topic : ASCII What does ASCII stand for? What is the size of an ASCII character. (bit width) How many characters can it represent? How many 'printable' characters are defined? How many control characters are defined? EBCDIC Wikipedia topic : EBCDIC What does EBCDIC stand for? What is the size of an EBCDIC character. (bit width) How many characters can it represent? 256 possible but not all values assigned an actual character. Look at the basic feature of UTF-8. Number of bytes for a code point. Where most commonly used. Look at the basic feature of UTF-16. Number of bytes for a code point. Where most commonly used. Unicode defines 17 planes (major groupings) athough many not in use. Plane 0 - Basic Multilingual (most modern languages) Plane 1 - Supplementary Multilingual Plane math symbols, music symbols, hieroglyphs, board game symbols (dominos, Mah Jongg, etc.) Plane 2 - Supplementary Ideographic Plane (CJK) Chinese, Japanese, Korean. Planes 3 to 13 - unassigned Plane 14 - Supplementary Special Purpose Plane XML formatting characters. Variation Selectors - used with CJK to identify alternative ideograms Plane 15,16 - priave use. Users can create custom definitions. e.g Klingon Code planes are divided into 256 byte code blocks. A code block usually can contain the characters of a particular language but some languages may take more than 1 block. A code point is one entry in the code block. Unicode protocol defines 3 aspects of a character set. Wikipedia topic : unicode A grapheme is a mimimal meaningful element of a language. It may require more than one code-point to be represented. wikipedia topic : grapheme A glyph is the visual representation of a character in a languages. It may consist of more than one grapheme. Or multiple glyphs may be combined to represent one code-point or grapheme. wikipedia topic : glyph See the following for a nice description : https://stackoverflow.com/questions/27331819/whats-the-difference-between-a-character-a-code-point-a-glyph-and-a-grapheme Rules - rules determing collating order, left to right or right to left display, glyph to use if certain graphemes are next to each other. Universal Coded Character sets UCS (Universal Character sets) protocol only assigns a character to a specific code point. Visual representation (Glyphs) or rules are NOT covered by the UCS protocol. UCS-2 Code-points defined as 2 byte fixed length storage. Designed to hold all current (living) written languages. Now obsolete, subsumed into UTF-16. UCS-4 (ISO 10646) Code-points defined as 4 byte fixed length storage. Incorporates the BMP (plane 0) and the CJK (Plane 2) and other code pages. Technically 31 bit characters Code points values 0x7FFFFFFF - 0xFFFFFFFF no longer valid. Currently incorporated in UTF-32 Unicode Transform Format - protocold defined by the Unicode Consortium. Code point assignments includes Unique charater to code point assignment. Visualy representation (Glyph) And rules for use. UTF-8 Code-points can be 1 to 4 byte depending on code page. Used extensively for Web pages. 1st block of first plane (1 byte code points) maps ASCII with minimal convertion. Has character sets represent both current languages, historical languages, and non-language symbols, such has math, music, etc. Currently the preferred protocol for both data transmission over the Internet and use by most modern OSes, such and Windows 7, 10 and Linux. UTF-16 Code-points are either 2 bytes or 4 bytes in size depending on code page. Used BOM (bit order mapping) to id the most significant byte. Preferred by earlier OSes such Windows 3, 98, because of compatibility with word size of system(?). Has character sets represent both current languages, historical languages, and non-language symbols, such has math, music, etc. UTF-32 (UCS-4) Code-points are fixed 4 bytes in size. Usually implemented within a programs storage procedures to provide for calculation of predictable storage arrays.