140-250 (IT2) / 976-250 (E-Biz2) / 977-250 (SE2)
Database Systems
Semester 1/2015
2
Contents
• Relational Database Design
– Informal Design Guidelines for
Relation Schemas
– Functional Dependencies
– Normal Forms Based on Primary
Keys
– General Definition of Second and
Third Normal Forms
3
Informal Design Guidelines for
Relation Schemas
• Making sure that the semantics of
the attributes is clear in the schema
• Reducing the redundant
information in tuples
• Reducing the NULL values in
tuples
• Disallowing the possibility of
generating spurious tuples
4
Relational Schema: OK
5
Sample Database: OK
6
Relational Schema: Not OK
7
Sample Database: Not OK
8
Relational Schema &
Sample Database: Not OK
9
Informal Design Guidelines for
Relation Schemas
• Guideline 1
– Design a relation schema so that it is easy to
explain its meaning. Do not combine attributes
from multiple entity types and relationship types
into a single relation.
– Intuitively, if a relation schema corresponds to
one entity type or one relationship type, it is
straightforward to interpret and to explain its
meaning. Otherwise, if the relation corresponds
to a mixture of multiple entities and
relationships, semantic ambiguities will result
and the relation cannot be easily explained.
10
Informal Design Guidelines for
Relation Schemas
• Guideline 2
– Design the base relation schemas so
that no insertion, deletion, or
modification anomalies are present in
the relations.
– If any anomalies are present, note
them clearly and make sure that the
programs that update the database
will operate correctly.
11
Informal Design Guidelines for
Relation Schemas
• Guideline 2
– Insertion anomalies
• Cannot insert a project unless an employee is assigned to it.
• Cannot insert an employee unless a he/she is assigned to a
project.
– Deletion anomalies
• When a project is deleted, it will result in deleting all the
employees who work on that project.
• Alternately, if an employee is the sole employee on a project,
deleting that employee would result in deleting the
corresponding project.
– Modification anomalies
• Changing the name of project number P1 from “Billing” to
“Customer-Accounting” may cause this update to be made for
all 100 employees working on project P1.
12
Anomalies
• Insertion anomalies
– We tried to insert data in a record that does not exist at
all.
• Deletion anomalies
– We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also
saved somewhere else.
• Modification anomalies
– If data items are scattered and are not linked to each
other properly, then it could lead to strange situations.
• For example, when we try to update one data item having its
copies scattered over several places, a few instances get
updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
13
Informal Design Guidelines for
Relation Schemas
• Guideline 3
– As far as possible, avoid placing attributes in a base
relation whose values may frequently be NULL.
– If NULLs are unavoidable, make sure that they apply
in exceptional cases only and do not apply to a
majority of tuples in the relation.
• The attribute does not apply to this tuple.
– For example, Visa_status may not apply to U.S. students.
• The attribute value for this tuple is unknown.
– For example, the Date_of_birth may be unknown for an
employee.
• The value is known but absent; that is, it has not been
recorded yet.
– For example, the Home_Phone_Number for an employee may
exist, but may not be available and recorded yet.
14
Informal Design Guidelines for
Relation Schemas
• Guideline 4
– Design relation schemas so that they can
be joined with equality conditions on
attributes that are appropriately related
(primary key, foreign key) pairs in a way
that guarantees that no spurious tuples are
generated.
– Avoid relations that contain matching
attributes that are not (foreign key, primary
key) combinations because joining on such
attributes may produce spurious tuples.
15
Spurious Tuples
16
Summary and Discussion of
Design Guidelines
• The problems which can be detected without
additional tools of analysis, are as follows:
– Anomalies that cause redundant work to be done
during insertion into and modification of a relation,
and that may cause accidental loss of information
during a deletion from a relation
– Waste of storage space due to NULLs and the
difficulty of performing selections, aggregation
operations, and joins due to NULL values
– Generation of invalid and spurious data during
joins on base relations with matched attributes that
may not represent a proper (foreign key, primary
key) relationship
17
Functional Dependencies
• A functional dependency is a constraint between two sets of
attributes from the database.
• This means that the values of the Y component of a tuple in r
depend on, or are determined by, the values of the X component;
• The values of the X component of a tuple uniquely (or
functionally) determine the values of the Y component.
• There is a functional dependency from X to Y, or that Y is
functionally dependent on X.
– The abbreviation for functional dependency is FD or f.d.
– The set of attributes X is called the left-hand side of the FD, and Y is
called the right-hand side.
18
Functional Dependencies
• Determinant --> Dependency
– 1:1
• Std_ID --> Std_Name
– 1:N
• Std_ID --> FirstName, LastName, BirthDate
– 2Way
• Std_ID --> Std_Pin
• Std_Pin --> Std_ID
• Std_ID Std_Pin
– N:1
• Std_ID, Course_ID --> Room
19
Normal Forms Based on
Primary Keys
• Normalization of data
– A process of analyzing the given relation schemas based on
their FDs and primary keys to achieve the desirable properties
of
• (1) minimizing redundancy and
• (2) minimizing the insertion, deletion, and update anomalies
– The normalization procedure provides database designers
with the following:
• A formal framework for analyzing relation schemas based on their
keys and on the functional dependencies among their attributes
• A series of normal form tests that can be carried out on individual
relation schemas so that the relational database can be normalized
to any desired degree
• Denormalization
– The process of storing the join of higher normal form relations
as a base relation, which is in a lower normal form.
20
Normalization
• The process of efficiently organizing data in a
database. There are two goals of the normalization
process:
– eliminating redundant data (for example, storing the
same data in more than one table) and
– ensuring data dependencies make sense (only storing
related data in a table).
• Both of these are worthy goals as they reduce the
amount of space a database consumes and ensure
that data is logically stored.
• Normalization is a method to remove all these
anomalies and bring the database to a consistent
state.
21
Normal Forms Based on
Primary Keys
• Key
– {Ssn} is a key for EMPLOYEE
• Superkeys
– {Ssn}, {Ssn, Ename}, {Ssn, Ename, Bdate}, and any set of attributes that includes
Ssn are all superkeys.
• Candidate key
– If a relation schema has more than one key, each is called a candidate key.
– Primary key
• One of the candidate keys is arbitrarily designated to be the primary key,
– Secondary key
• and the others are called secondary keys.
• Prime attribute
– An attribute of relation schema R is called a prime attribute of R if it is a member of
some candidate key of R.
• Both Ssn and Pnumber are prime attributes of WORKS_ON.
• Nonprime attribute
– An attribute is called nonprime if it is not a prime attribute—that is, if it is not a
member of any candidate key
• Other attributes of WORKS_ON are nonprime.
22
Functional Dependency
• Trivial Functional Dependency
• Full Functional Dependency
• Transitive Dependency
• Multivalued Dependency
23
Functional Dependency
• Saying that there is a dependency between
attributes in a table is the same as saying that there
is a functional dependency between those attributes.
• If there is a dependency in a database such that
attribute B is dependent upon attribute A, you would
write this as “A -> B”.
– For example, In a table listing employee characteristics
including Social Security Number (SSN) and name, it can
be said that name is dependent upon SSN (or SSN ->
name) because an employee's name can be uniquely
determined from their SSN.
– However, the reverse statement (name -> SSN) is not
true because more than one employee can have the
same name but different SSNs.
24
Trivial Functional Dependencies
• occurs when you describe a functional
dependency of an attribute on a collection of
attributes that includes the original attribute.
For example, “{A, B} -> B” is a trivial functional
dependency, as is “{name, SSN} -> SSN”.
• This type of functional dependency is called
trivial because it can be derived from common
sense. It is obvious that if you already know
the value of B, then the value of B can be
uniquely determined by that knowledge.
25
Full Functional Dependency
• occurs when you already meet the
requirements for a functional dependency
and the set of attributes on the left side of
the functional dependency statement
cannot be reduced any farther.
– For example, “{SSN, age} -> name” is a
functional dependency, but it is not a full
functional dependency because you can
remove age from the left side of the
statement without impacting the dependency
relationship.
26
Transitive Dependency
• occur when there is an indirect
relationship that causes a
functional dependency.
– For example,
”A -> C” is a transitive
dependency when it is true only
because both “A -> B” and “B -> C”
are true.
27
Multivalued Dependency
• occur when the presence of one or more
rows in a table implies the presence of one
or more other rows in that same table.
– For example, imagine a car company that
manufactures many models of car, but always
makes