Normalization
Normalization is a process
that “improves” a database design by generating relations that are of higher
normal forms.
The objective of normalization: “to create relations where every dependency is on
the key, the whole key, and nothing but the key”.
We discuss
four normal forms: first, second, third, and Boyce-Codd normal forms
1NF, 2NF,
3NF, and BCNF
There is a sequence
to normal forms:
1NF is
considered the weakest,
2NF is
stronger than 1NF,
3NF is
stronger than 2NF, and
BCNF is
considered the strongest
Also,
any relation
that is in BCNF, is in 3NF;
any relation
in 3NF is in 2NF; and
any relation
in 2NF is in 1NF.
a relation in BCNF, is also in 3NF
a relation in 3NF is also in 2NF
a relation in 2NF is also in 1NF
We consider
a relation in BCNF to be fully normalized.
The benefit
of higher normal forms is that update semantics for the affected data are
simplified.
This means
that applications required to maintain the database are simpler.
A design
that has a lower normal form than another design has more redundancy.
Uncontrolled redundancy can lead to data integrity problems.
First we
introduce the concept of functional dependency
Functional Dependencies
We say an attribute, B, has a functional dependency
on another attribute, A, if for any two records, which have
the same value for A, then the values for B in these
two records must be the same. We illustrate this as:
A Ã B
Example:
Suppose we keep track of employee email addresses, and we only track one email
address for each employee. Suppose each employee is identified by their unique
employee number. We say there is a functional dependency of email address on
employee number:
employee
number à email address
If EmpNum is the PK then the FDs:
EmpNum à EmpEmail
EmpNum à EmpFname
EmpNum à EmpLname
must exist.
EmpNum à EmpEmail
EmpNum à EmpFname
EmpNum à EmpLname
First Normal Form
We say a relation is in 1NF if all
values stored in the relation are single-valued and atomic.
1NF places restrictions on the
structure of relations. Values must be simple.
The
following in not in 1NF
EmpNum
|
Empphone
|
EmpDegrees
|
123
|
250-7853
|
|
233
|
250-1325
|
BA,BSC,PhD
|
976
|
250-2311
|
BSc,MSc
|
EmpDegrees is a multi-valued field:
employee 976has two degrees: BSc and MSc
employee 233 has three degrees: BA, BSc, PhD
EmpNum
|
EmpPhone
|
EmpDegrees
|
123
|
250-7853
|
|
233
|
250-1325
|
BA,BSC,PhD
|
976
|
250-2311
|
BSc,MSc
|
To obtain 1NF relations we must, without loss of
information, replace the above with two relations are given below
Boyce-Codd Normal Form
BCNF is
defined very simply:
a relation
is in BCNF if it is in 1NF and if every determinant is a candidate key.
If our database will be used for OLTP (on line
transaction processing), then BCNF is our target. Usually, we meet this
objective. However, we might denormalize (3NF, 2NF, or 1NF) for performance
reasons.
Second Normal Form
A relation is in 2NF if it is in 1NF, and every
non-key attribute is fully dependent on each candidate key. (That is, we don’t
have any partial functional dependency.)
•
2NF (and 3NF) both involve
the concepts of key and non-key attributes.
•
A key attribute is any attribute that
is part of a key; any attribute that is
not a key attribute,
is a non-key attribute.
•
Relations that are not in BCNF have data
redundancies
•
A relation in 2NF will not have any partial
dependencies
Consider this InvLine table
(in 1NF):
InvNum
|
LineNum
|
ProdNum
|
Qty
|
InvDate
|
There are
two
candidate
keys.
Qty is the only non-key
attribute, and it is
dependent on InvNum
|
InvNum
→ InvDate
Since
there is a determinant that is not a candidate key, InvLine is not BCNF
InvLine
is not 2NF since there is a partial dependency of InvDate on InvNum
InvLine
InvNum
|
LineNum
|
ProdNum
|
Qty
|
InvDate
|
The
above relation has redundancies: the invoice date is repeated on each invoice
line.
We
can improve the database by decomposing the relation into two relations:
Third Normal Form
•
A relation
is in 3NF if the relation is in 1NF and all determinants of non-key
attributes are candidate keys That is, for any functional dependency: X ® Y, where Y is a non-key attribute (or a set of
non-key attributes), X is a candidate key.
•
This
definition of 3NF differs from BCNF only in the specification of non-key
attributes - 3NF is weaker than BCNF. (BCNF requires all determinants to be
candidate keys.)
•
A relation
in 3NF will not have any transitive dependencies of non-key attribute on a candidate key through
another non-key attribute.
Consider
this Employee relation
EmpName,
DeptNum, and DeptName are non-key attributes.
DeptNum
determines DeptName, a non-key attribute, and DeptNum is not a candidate key.
Is
the relation in 3NF? … no
Is
the relation in 2NF? … yes
Is
the relation in BCNF? … no
We correct the situation by
decomposing the original relation into two 3NF relations. Note the
decomposition is lossless.
Verify these two relations are in 3NF.