Fall 2018



Bring scratch paper.


Series of multiple choice and true false questions on the various character 
sets covered in lecture and the linked reading.

Conversion of signed integer, conversion to binary, conversion of binary value 
to signed integer. The size of storage will be given but will not be 8-bit. 
So, you need to know the algorithms.

Conversion of real float to stored binary float modeled after IEEE standard,
but actual exponent and significand sizes varie from standard. Know the 
conversion algorithms. Note for this fractional portion may be multiple digits
long. The conversion method is simple.

Conversion of stored binary float modeled after IEEE standard back to an 
actual float value given a specific signifand and exponent. For this I will
limit the fractional portion of the answer to 3 digits to the right max. 

The significand and exponent sizes for the conversion are 7, 8, or 9 bits
long but mixed. 

Calculate the maximum, minimum, and denormalized minimum a float of a given
exponent and significand size can represent.

For all of these, you will be given the value and the size of storage and a 
list of alternative solutions to pick from.  Work out your answers on scratch 
paper and select correct answer.

Know how zero, overflow, normal, denormalized, and NaN are flagged in a 
float storage.




Integers
  Given a 12 bit integer storage size, convert the value 673  to its binary,
  hex, and octal values. Use divide by 2.



673  
336  1
168  0
 84  0
 42  0
 21  0
 10  1
  5  0
  2  1
  1  0
  0  1


       = 10 1010 0001 b      1 010 100 001 b
          2    A    1 h      1   2   4   1 o


  Give its 2s complement values.  (This example is 12 bit storage.

  0010 1010 0001 b Make sure you left pad with zeros to full width of storage

  1101 0101 1110b
 +             1
  1101 0101 1111 b   D5Fh


  Convert both the original and 2's complement binary values back to an
  unsigned integer.

  0 0 1 0 1 0 1 0 0 0 0 1        1 1 0 1 0 1 0 1 1 1 1 1
      1                          1
        2                          3
          5                          6
           10                         13
             21                         26
               42                         53
                 84                        106
                  168                        213
                    336                        427
                      673                        855
                                                  1711
                                                    3423
  Check by subtraction :

  2^12 = 4096   4096 - 673 = 3423

Floats
  Give the general formula for calculating the bias value for a floating
  point value.

    ^(n-1)
   2       -1

  Given a floating point number storage with an 12 bit mantissa and a 10 bit
  exponent storage size and an actual real value 912.730 :
 
  Convert a real number to its floating point storage.
     convert integer portion
     convert fractional portion (truncate, round to zero)

    integer       decimal    * Nine bits used for integer, so only 3 bits
    912              .  .730 *   actuall needed for fraction.
    456 0            1  .460
    228 0            0  .920
    114 0            1  .840
     57 0            1  .680
     28 1            1  .360
     14 0            0  .720
      7 0            1  .440
      3 1            0  .880
      1 1            1  .760
      0 1            1  .520

  I ->  11 1001 0000  MSB is virtual so we use the lower 9   1 1001 0000
  F -> .1011101011    We only need 3 from the fractional side.

   1110010000.101 b (no rounding, MSB not hidden yet)

     normalize

   1.110010000101 b * 2^9
     (remember, we don't store bit to the left of . )

     calculate bias.

     2^(10-1)-1  = 511 bias       511+9 = 520 biased exponent

     520                     1 0 0 0  0  0  1   0   0   0
     260  0                  1 2
     130  0                    2 4
      65  0                      4 8
      32  1                        8 16
      16  0                          16 32
       8  0                             32 64
       4  0                                65 130
       2  0                                   130 260
       1  0                                       260 520
       0  1  -   10 0000 1000b                        520

     
     0 1000001000 110010000101  (truncated, not rounded)
     ^  exponent  significand
     s
     i
     g
     n 

  Convert the actual stored value back to a real number.

    Exponent 520 - 511 = 9.

    Stored  mantissa  110010000101   
    -> with virtual 1      1.110010000101
    Converted from  normalized   1110010000.101

    1 1 1  0  0  1   0   0   0   0
    1 2
      3 6
        7 14
          14 28
             28 56
                57 114
                   114 228
                       228 456
                           456  912

   .101 
   . 
   1  * .5    = .5
   0  * .25   = .0
   1  * .125  = .125  .5 + .125 = 625

   Original 912.730  Stored 912.625

MinMax

  7 bit exponent 8 bit mantissa

  Max :  0 111 1110   1111 1111

  Bias 63

  Biased exponent  1 1 1 1 1 1 0
                   1
                     3
                       7
                        15
                          31
                            63
                             126    Unbiased exponent 126 - 63 = 63


  1.11111111b * 2^63

  (10b - 0.00000001b) * 2^63 = (2^1 - 2^-8) * 2^63 = 2^64 - 2^55


  Denormalized min:    0 000 0000  0000 0001
   Remember, although the exponent is set to zero, it is functioning as a
   flag and should be treated as if it holds a 1. And the virtual 1 found
   at the beginning of all normalized floats does not exist it situation.

   Unbiased exponent 1 - 63(bias) = -62

   0.00000001 * 2^-62  = 1 * 2^(-62 -8) = 2^-70


  Generic format of single percission floats.
   b = either zero or 1

 
   S
   i
   g
   n  8 bit exponent    23 bit  significand

   b  bbbbbbbb          bbb - bbbbb   23 bits           normalized

     Exponent can be any value other all 0s or all 1s.

   b  00000000           all zeros                        Zero
  
   b  00000000           any value other than all zeros   denormalized.

   b  11111111           all 1s                           infinity

   b  11111111           
 
*******************

Review the general features of the various character sets enough
to answer true/false questions.


Basic storage and Character lectures

ASCII    Wikipedia topic : ASCII

  What does ASCII stand for?   

  What is the size of an ASCII character.  (bit width)

  How many characters can it represent? 

  How many 'printable' characters are defined?

  How many control characters are defined?


EBCDIC   Wikipedia topic : EBCDIC

  What does EBCDIC stand for?
  
  What is the size of an EBCDIC character.  (bit width)

  How many characters can it represent? 
    256 possible but not all values assigned an actual character.


Look at the basic feature of UTF-8. 
  Number of bytes for a code point.

  Where most commonly used.

Look at the basic feature of UTF-16. 
  Number of bytes for a code point.

  Where most commonly used.


Unicode defines 17 planes (major groupings) athough many not in use.
  Plane 0 - Basic Multilingual (most modern languages)

  Plane 1 - Supplementary Multilingual Plane 
    math symbols, music symbols, hieroglyphs, board game symbols 
    (dominos, Mah Jongg, etc.)

  Plane 2 - Supplementary Ideographic Plane  
   (CJK) Chinese, Japanese, Korean.
  
  Planes 3 to 13 - unassigned

  Plane 14 - Supplementary Special Purpose Plane  
    XML formatting characters.
    Variation Selectors - used with CJK to identify alternative ideograms

  Plane 15,16 - priave use. 
    Users can create custom definitions.
    e.g Klingon
  
  Code planes are divided into 256 byte code blocks.
  
  A code block usually can contain the characters of a particular language
    but some languages may take more than 1 block.

  A code point is one entry in the code block.

Unicode protocol defines 3 aspects of a character set.
  Wikipedia topic : unicode

  A grapheme is a mimimal meaningful element of a language. It may require
    more than one code-point to be represented.
    wikipedia topic : grapheme  
  
  A glyph is the visual representation of a character in a languages. 
    It may consist of more than one grapheme. Or multiple glyphs may be 
    combined to represent one code-point or grapheme. 
    wikipedia topic : glyph  

  See the following for a nice description :

  https://stackoverflow.com/questions/27331819/whats-the-difference-between-a-character-a-code-point-a-glyph-and-a-grapheme

  Rules - rules determing collating order, left to right or right to left
    display, glyph to use if certain graphemes are next to each other.


Universal Coded Character sets

UCS (Universal Character sets) protocol only assigns  a character to a 
specific code point. 

Visual representation (Glyphs) or rules are NOT covered by the UCS protocol.

UCS-2 
   Code-points defined as 2 byte fixed length storage.

   Designed to hold all current (living) written languages.

   Now obsolete, subsumed into UTF-16.

UCS-4 (ISO 10646)
   Code-points defined as 4 byte fixed length storage.

   Incorporates the BMP (plane 0) and the CJK (Plane 2) and other code pages.

   Technically  31 bit characters

   Code points values 0x7FFFFFFF - 0xFFFFFFFF no longer valid.

   Currently incorporated in UTF-32

Unicode Transform Format - protocold defined by the Unicode Consortium.
  Code point assignments includes 
    Unique charater to code point assignment.

    Visualy representation (Glyph)

    And rules for use.

UTF-8 
   Code-points can be 1 to 4 byte depending on code page.

   Used extensively for Web pages. 

   1st block of first plane (1 byte code points) maps ASCII 
     with minimal convertion. 

   Has character sets represent both current languages, historical languages,
     and non-language symbols, such has math, music, etc. 

   Currently the preferred protocol for both data transmission over the 
     Internet and use by most modern OSes, such and Windows 7, 10 and Linux.
 
UTF-16
   Code-points are either 2 bytes or 4 bytes in size depending on code page.

   Used BOM (bit order mapping) to id the most significant byte.

   Preferred by earlier OSes such Windows 3, 98, because of compatibility with 
     word size of system(?). 
    
   Has character sets represent both current languages, historical languages,
     and non-language symbols, such has math, music, etc. 
 
UTF-32 (UCS-4)

   Code-points are fixed 4 bytes in size.

   Usually implemented within a programs storage procedures to provide
   for calculation of predictable storage arrays.