Friday, May 22, 2020

what is Cardinality or binary relationship? explain in details with example and diagram


Cardinality/Binary relationship

It defines the numerical attributes of the relationship between two entites or entity sets.
One to one relationship : -
When a single instance of an entity is associated with a single instance of another entity then it is called one to one relationship.
For example : - a person has only one passport and a passport is given to one person.









One to many relationship : -
When a single instance of an entity is associated with more than one instance of another entity then it is called one to many relationship.
For example: -
One customer can placed many orders but an order can’t be placed by many customers.










Many to One relationship: -
When more than one instance of an entity is associated with single instance of another entity then it is called many to one relationship.
For example: -
Many students can study in a single college but a student in many colleges at same time.







Many to many relationship: -
When more than one instance is associated with more than instance of another entity then it is called many to many relationship.
For example: -
More than one employee assigned to many project and many project can be assigned to many employee.






Monday, May 18, 2020

What is FILES of records in DBMS? explain in details. Also explain operation on files


FILES OF RECORDS

l  A file is a sequence  of records, where each record is a collection of data values (or data items).

l  A file descriptor  (or file header ) includes information that describes the file, such as the field names  and their data types, and the addresses of the file blocks on disk.

l  Records are stored on disk blocks. The blocking factor bfr for a file is the (average) number of file records stored in a disk block.

l  A file can have fixed-length  records or variable-length  records.

Operation on Files


       Typical file operations include:

l  OPEN: Readies the file for access, and associates a pointer that will refer to a current  file record at each point in time.

l  FIND: Searches for the first file record that satisfies a certain condition, and makes it the current file record.

l  FINDNEXT: Searches for the next file record (from the current record) that satisfies a certain condition, and makes it the current file record.

l  READ: Reads the current file record into a program variable.

l  INSERT: Inserts a new record into the file, and makes it the current file record.

l  DELETE: Removes the current file record from the file, usually by marking the record to indicate that it is no longer valid.

l  MODIFY: Changes the values of some fields of the current file record.

l  CLOSE: Terminates access to the file.

l  REORGANIZE: Reorganizes the file records. For example, the records marked deleted are physically removed from the file or a new organization of the file records is created.

l  READ_ORDERED: Read the file blocks in order of a specific field of the file.

Unordered Files
l  Also called a heap or a pile  file.

l  New records are inserted at the end of the file.

l  To search for a record, a linear search  through the file records is necessary. This requires reading and searching half the file blocks on the average, and is hence quite expensive.

l  Record insertion is quite efficient.

l  Reading the records in order of a particular field requires sorting the file records.

Ordered Files
l  Also called a sequential file.

l  File records are kept sorted by the values of an ordering field.

l  Insertion is expensive: records must be inserted in the correct order. It is common to keep a separate unordered overflow  (or transaction ) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file.

l  A binary search  can be used to search for a record on its ordering field value. This requires reading and searching log2 of the file blocks on the average, an improvement over linear search.

l  Reading the records in order of the ordering field is quite efficient.
                                                             
                                                              Figure 1 Ordered Files

Hashed files Hashing for disk files is called External Hashing. To continue read about HASHING or HASHED FILES visit my blogger site https://myrdbmsnotes.blogspot.com/2020/05/what-is-hashed-files-in-dbms-explain.html


What is Hashed files in DBMS? explain with proper diagram


       Hashed Files
l  Hashing for disk files is called External Hashing

l  The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ...,
bucket M-1. Typically, a bucket corresponds to one (or a fixed number of) disk block.

l  One of the file fields is designated to be the hash key of the file.

l  The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function.

l  Search is very efficient on the hash key.

l  Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept for storing such records. Overflow records that hash to each bucket can be linked together.

There are numerous methods for collision resolution, including the following:

l  Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found.

l  Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location.

l  Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.
                                                              Figure 1 Hashed Files

l  To reduce overflow records, a hash file is typically kept 70-80% full.

l  The hash function h should distribute the records uniformly among the buckets; otherwise, search time will be increased because many overflow records will exist.

l  Main disadvantages of static external hashing:

               - Fixed number of buckets M is a problem if the number of records in the file grows                              or  shrinks.

               - Ordered access on the hash key is quite inefficient (requires  sorting the records).

                                                   Figure 2   Hashed Files - Overflow handling

Dynamic And Extendible Hashed Files

   Dynamic and Extendible Hashing Techniques

l  Hashing techniques are adapted to allow the dynamic growth and shrinking of the number of file records.

l  These techniques include the following: dynamic hashing , extendible hashing , and linear hashing .

l  Both dynamic and extendible hashing use the binary representation  of the hash value h(K) in order to access a directory. In dynamic hashing the directory is a binary tree. In extendible hashing the directory is an array of size 2d where d is called the global depth.

l  The directories can be stored on disk, and they expand or shrink dynamically. Directory entries point to the disk blocks that contain the stored records.

l  An insertion in a disk block that is full causes the block to split into two blocks and the records are redistributed among the two blocks. The directory is updated appropriately.

l  Dynamic and extendible hashing do not require an overflow area.

l  Linear hashing does require an overflow area but does not use a directory. Blocks are split in linear order  as the file expands.

                                                                   Figure 3  Extendible Hashing

Thursday, May 14, 2020

In DBMS-What is Timestamp based concurrency control algorithm? explain Multiversion technique using timestamp ordering


Timestamp based concurrency control algorithm

              
Timestamp : -  A monotonically increasing variable (integer) indicating the age of an operation or a transaction.  A larger timestamp value indicates a more recent event or operation.
Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions

     Basic Timestamp Ordering
 1.  Transaction T issues a write_item(X) operation:
      1. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then an younger transaction has already read the data item so abort and roll-back T and reject the operation.
      2. If the condition in part (a) does not exist, then execute write_item(X) of T and set write_TS(X) to TS(T).
2.  Transaction T issues a read_item(X) operation:
      1. If write_TS(X) > TS(T), then an younger transaction has already written to the data item so abort and roll-back T and reject the operation.
If write_TS(X) £ TS(T), then execute read_item(X) of T and set read_TS(X) to the larger of TS(T) and the current read_TS(X

Strict Timestamp Ordering          
 1.  Transaction T issues a write_item(X) operation:
     a. If TS(T) > read_TS(X), then delay T until the transaction T’ that wrote or read X has terminated         (committed or aborted).
2.  Transaction T issues a read_item(X) operation:
a. If TS(T) > write_TS(X), then delay T until the transaction T’ that wrote or read X has terminated        (committed or aborted).

Thomas’s Write Rule
1.If read_TS(X) > TS(T) then abort and roll-back T and reject the operation.

2.If write_TS(X) > TS(T), then just ignore the write operation and continue execution.  This is because the most recent writes counts in case of two consecutive writes.

3.If the conditions given in 1 and 2 above do not occur, then execute write_item(X) of T and set write_TS(X) to TS(T).

Multiversion concurrency control techniques

Concept

This approach maintains a number of versions of a data item and allocates the right version to a read operation of a transaction.  Thus unlike other mechanisms a read operation in this mechanism is never rejected.

Side effect:  Significantly more storage (RAM and disk) is required to maintain multiple versions.  To check unlimited growth of versions, a garbage collection is run when some criteria is satisfied.

Multiversion technique based on timestamp ordering
This approach maintains a number of versions of a data item and allocates the right version to a read operation of a transaction.  Thus unlike other mechanisms a read operation in this mechanism is never rejected.

Side effects:  Significantly more storage (RAM and disk) is required to maintain multiple versions.  To check unlimited growth of versions, a garbage collection is run when some criteria is satisfied.

Explanation

Assume X1, X2, …, Xn are the version of a data item X created by a write operation of transactions.  With each Xi a read_TS (read timestamp) and a write_TS (write timestamp) are associated.

read_TS(Xi):  The read timestamp of Xi is the largest of all the timestamps of transactions that have successfully read version Xi.

write_TS(Xi):  The write timestamp of Xi that wrote the value of version Xi.

A new version of Xi is created only by a write operation.

To ensure serializability, the following two rules are used.

If transaction T issues write_item (X) and version i of X has the highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T), and read _TS(Xi) > TS(T), then abort and roll-back T;  otherwise create a new version Xi and read_TS(X) = write_TS(Xj) = TS(T).

If transaction T issues read_item (X), find the version i of X that has the highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T), then return the value of Xi to T, and set the value of read _TS(Xi) to the largest of TS(T) and the current read_TS(Xi).


Monday, May 11, 2020

What is Normalization in DBMS & its form? explain in details with example


                         Normalization

Normalization is a process that “improves” a database design by generating relations that are of higher normal forms.
The objective of normalization: to create relations where every dependency is on the key, the whole key, and nothing but the key”.
We discuss four normal forms: first, second, third, and Boyce-Codd normal forms
1NF, 2NF, 3NF, and BCNF

There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest

Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.
a relation in BCNF, is also in 3NF
a relation in 3NF is also in 2NF
a relation in 2NF is also in 1NF
We consider a relation in BCNF to be fully normalized.

The benefit of higher normal forms is that update semantics for the affected data are simplified.

This means that applications required to maintain the database are simpler.

A design that has a lower normal form than another design has more redundancy. Uncontrolled redundancy can lead to data integrity problems.

First we introduce the concept of functional dependency

Functional Dependencies
We say an attribute, B, has a functional dependency on another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two records must be the same. We illustrate this as:
A à B
Example: Suppose we keep track of employee email addresses, and we only track one email address for each employee. Suppose each employee is identified by their unique employee number. We say there is a functional dependency of email address on employee number:

employee number   Ã   email address

If EmpNum is the PK then the FDs:
         EmpNum  Ã  EmpEmail
         EmpNum à EmpFname
         EmpNum à EmpLname
must exist.
EmpNum  Ã  EmpEmail

EmpNum à EmpFname
 EmpNum à EmpLname
  





























First Normal 
Form

We say a relation is in 1NF if all values stored in the relation are single-valued and atomic.

1NF places restrictions on the structure of relations. Values must be simple.
                                                                                                                                                                                          
The following in not in 1NF


EmpNum

Empphone

EmpDegrees

123
250-7853

233
250-1325
BA,BSC,PhD
976
250-2311
BSc,MSc

EmpDegrees is a multi-valued field:
employee 976has two degrees: BSc and MSc
employee 233 has three degrees: BA, BSc, PhD
EmpNum

EmpPhone

EmpDegrees

123
250-7853

233
250-1325
BA,BSC,PhD
976
250-2311
BSc,MSc

To obtain 1NF relations we must, without loss of information, replace the above with two relations are given below

Boyce-Codd Normal Form
BCNF is defined very simply:
a relation is in BCNF if it is in 1NF and if every determinant is a candidate key.
If our database will be used for OLTP (on line transaction processing), then BCNF is our target. Usually, we meet this objective. However, we might denormalize (3NF, 2NF, or 1NF) for performance reasons.
Second Normal Form
A relation is in 2NF if it is in 1NF, and every non-key attribute is fully dependent on each candidate key. (That is, we don’t have any partial functional dependency.)
       2NF (and 3NF) both involve the concepts of key and non-key attributes.
       A key attribute is any attribute that is part of a key; any attribute that is not a key attribute,
     is a non-key attribute.
       Relations that are not in BCNF have data redundancies
       A relation in 2NF will not have any partial dependencies
Consider this InvLine table (in 1NF):
InvNum

LineNum

ProdNum

   Qty

InvDate


                                                                      There are two
                                       candidate keys.
   Qty is the only non-key
                                        attribute, and it is
                                                dependent on InvNum
                                            

InvNum  →  InvDate
Since there is a determinant that is not a candidate key, InvLine is not BCNF
InvLine is not 2NF since there is a partial dependency of InvDate on InvNum
InvLine
InvNum

LineNum

ProdNum

   Qty

InvDate


The above relation has redundancies: the invoice date is repeated on each invoice line.
We can improve the database by decomposing the relation into two relations:
     
          

Third Normal Form

      A relation is in 3NF if the relation is in 1NF and all determinants of non-key attributes are candidate keys That is, for any functional dependency: X ® Y, where Y is a non-key attribute (or a set of non-key attributes), X is a candidate key.
      This definition of 3NF differs from BCNF only in the specification of non-key attributes - 3NF is weaker than BCNF. (BCNF requires all determinants to be candidate keys.)
      A relation in 3NF will not have any transitive dependencies of non-key attribute on a candidate key through another non-key attribute.

Consider this Employee relation

EmpName, DeptNum, and DeptName are non-key attributes.
DeptNum determines DeptName, a non-key attribute, and DeptNum is not a candidate key.
Is the relation in 3NF? … no
Is the relation in 2NF? … yes
Is the relation in BCNF? … no


We correct the situation by decomposing the original relation into two 3NF relations. Note the decomposition is lossless.


 


Verify these two relations are in 3NF.