Hashing
Up to this point, you have been building tables sequentially and performing a linear search when a record was to be processed. This process is alright for smaller tables, but it becomes a waste of time when a table is large. So, a new random search, known as hashing, will be introduced.
Hashing can be used to both build and search a table. The basic idea behind hashing is to take a key and convert it through some fixed process to a hash key in the range of 0 to n-1, where n is the maximum number of slots in the table. The hash key will then be used to either store or find an item in the table.
There is the possibility that two keys could resolve to the same hash key. This situation is known as a COLLISION. When this occurs, the item will be stored in the next available slot in the table, assuming that the table is not already full.
The process of finding the next available slot when a collision occurs is known as a LINEAR PROBE. The linear probe is implemented via a linear search from the point of collision. If the physical end of table is reached during the linear search, the search will wrap around to the top of the table and continue from there.
If an empty slot is not found before reaching the point of collision, the table is full.
The fixed process to convert a key to a hash key is known as a hash function. This function will be used whenever access to the table is needed.
The most common method of determining a hash key is the division method of hashing. The formula that will be used is:
hash key = key % number of entries in the table
The above formula uses the remainder of the division of the key by the number of entries in the table as the hash key
For example, if we assume a table that can hold 8 items:
hash key = key % number of entries in the table 4 = 36 % 8 2 = 18 % 8 0 = 72 % 8 3 = 43 % 8 6 = 6 % 8 2 = 10 % 8
Assembler does not allow subscripting/indexing so a displacement into the table must be calculated using the hash key. The formula for calculating a displacement into the table is:
displacement = hash key * length of a table entry
The displacement can then be used to calculate the table entry address.
table entry address = displacement + beginning address of the table
Hash routine pseudocode
searchKey is the key field of the entry to insert or the key that is being searched for Calculate the table entry address Save the table entry address Set the return code to -1 Do while (the return code is -1) If an empty table slot is found Set the return code to 4 Return the address of this table slot Else If table entry's key is equal to searchKey Set the return code to 0 Return the address of this table slot Else Increment the table pointer to the next entry If the physical end of table is reached Wrap around to the beginning of the table Endif If the table pointer is equal to saved table entry address Set the return code to 8 Endif Endif Endif Enddo
The return codes that are set in the pseudocode have a different meaning depending on whether an item is being inserted into the table or if the table is being searched for an item.
Return code meanings when INSERTING into the table:
Return Code Meaning 0 A duplicate key has been found. DO NOT insert 4 An empty slot has been found. OK to insert 8 Table is full. DO NOT insert
Return code meanings when SEARCHING the table for a key:
Return Code Meaning 0 Successful Search. Key found 4 Unsuccessful Search. Empty slot found 8 Unsuccessful Search. Entire table has been searched