Hashed Files
l Hashing for disk files is called External
Hashing
l The file blocks are divided into M
equal-sized buckets, numbered bucket0, bucket1,
...,
bucket M-1. Typically, a bucket
corresponds to one (or a fixed number of) disk block.
l One of the file fields is designated
to be the hash key of the file.
l The record with hash key value K is
stored in bucket i, where i=h(K), and h is the hashing function.
l Search is very efficient on the hash
key.
l Collisions occur when a new record
hashes to a bucket that is already full. An overflow file is kept for storing
such records. Overflow records that hash to each bucket can be linked together.
There are
numerous methods for collision resolution, including the following:
l Open addressing: Proceeding from the occupied
position specified by the hash address, the program checks the subsequent
positions in order until an unused (empty) position is found.
l Chaining: For this method, various overflow
locations are kept, usually by extending the array with a number of overflow
positions. In addition, a pointer field is added to each record location. A
collision is resolved by placing the new record in an unused overflow location
and setting the pointer of the occupied hash address location to the address of
that overflow location.
l Multiple hashing: The program applies a second hash
function if the first results in a collision. If another collision results, the
program uses open addressing or applies a third hash function and then uses
open addressing if necessary.
Figure 1 Hashed Files
l To reduce overflow records, a hash
file is typically kept 70-80% full.
l The hash function h should
distribute the records uniformly among the buckets; otherwise, search time will
be increased because many overflow records will exist.
l Main disadvantages of static
external hashing:
- Fixed number of buckets M is a
problem if the number of records in the file grows or
shrinks.
- Ordered access on the hash key
is quite inefficient (requires sorting
the records).
Figure
2 Hashed Files - Overflow handling
Dynamic And Extendible Hashed Files
Dynamic and Extendible Hashing Techniques
l Hashing techniques are adapted to
allow the dynamic growth and shrinking of the number of file records.
l These techniques include the
following: dynamic hashing , extendible hashing , and linear
hashing .
l Both dynamic and extendible hashing
use the binary representation of
the hash value h(K) in order to access a directory. In dynamic hashing
the directory is a binary tree. In extendible hashing the directory is an array
of size 2d where d is called the global depth.
l The directories can be stored on
disk, and they expand or shrink dynamically. Directory entries point to the
disk blocks that contain the stored records.
l An insertion in a disk block that is
full causes the block to split into two blocks and the records are
redistributed among the two blocks. The directory is updated appropriately.
l Dynamic and extendible hashing do
not require an overflow area.
l Linear hashing does require an
overflow area but does not use a directory. Blocks are split in linear
order as the file expands.
Figure 3 Extendible Hashing
No comments:
Post a Comment