This video talks about datamodeling

This video talks about data
modeling and UML, the Unified Modeling Language.
The area of data modeling
consists of how we represent
the data for an application.
We've talked a great length about the relational data model.
Its widely used and we
have good design principles for coming up with relational schemas.
We also talked about XML as
a data model, XML is quite
a bit newer and there are
no design principles that are
analogous to the ones for the relational model.
But frequently when people are
designing a database, they'll actually
use a higher level model
that's specifically for database design.
These models aren't implemented by
the database system, rather they're
translated into the model of the database system.
So let's draw a picture of that.
Let's suppose that we have
a relational database management system
which is abbreviated RDBMS often, and
I'll draw that as a disk just out of tradition.
So, if we create a database
in a relational system the
database is going to consist of relations.
but instead of designing relations
directly, the database designer,
we'll draw that up here, will
use instead a higher-level design model.
That model will then go
through a translator, and this
can often be an automatic
process that will translate the
higher level model into the
relations that are implemented by the database system.
So what are these higher-level models?
Historically, for decades in
fact, the entity relationship
model, also known as the
ER model, was a very popular one.
But more recently the unified
modeling language has become popular
for higher-level database design.
The unified modeling language is
actually a very large language,
not just for database designs, but also for designing programs.
So what we're going to look
at is the data modeling subset of UML.
Both of these design models are
fundamentally graphical, so in
designing a database, the user
will draw boxes and arrows, perhaps other shapes.
And also both of them
can be translated, generally automatically, into relations.
Sometimes there may be little human
intervention in the translation process, but often that's not necessary.
So in the data modeling subset of
UML, there are five basic concepts.
Classes, associations, association classes, sub-classes, and composition and aggregation.
We're just going to go
through each one of those
concepts in turn with examples.
So that class concept in UML
is not specific to data-modeling.
It's also used for designing programs.
The class consists of a
name for the class, attributes of
the class, and methods in the
class, and that's probably familiar to you again from programming.
For data modeling specifically, we
add to the attributes the
concept of a primary key,
and we drop the methods
that are associated since we're focusing,
really, on the data modeling at this point.
So we'll be drawing our examples,
as usual, from a imaginary
college admissions database with
students and colleges and students applying to colleges and so forth.
So one of our classes, not
surprisingly, will be the student class.
And in UML we'll draw a
class as a box
like this, and at the
top we put the name
of the class and then we
put the attributes of the class,
so let's suppose that we'll just keep it simple.
We'll have a student ID, a
student name, and for
now, the student's GPA and
down here in UML would
be the specification of the methods.
Again we're not going to
be focusing on methods since we
are looking at data-modeling,and not the operations on the data.
And so one difference is that we'll have no methods.
Another is that we specify
a primary key if we
wish and that's specified
using the terminology PK.
So we'll say that the student ID in this case is the primary key.
And just as in keys in
the relational model, that means
that when we have a set
of objects for the student
class, each object will have a unique student ID.
There will be no student IDs repeated across objects.
in our college application database, we're
also likely to have a
class for colleges, so we'll have a class that we call college.
And for now, we'll make
the attributes of that
class, just the college name and the state.
And again in full UML, there might be some methods down here.
And we'll make the college
name and this case be the primary key.
So we're assuming now that college names themselves are unique.
So that's it for classes.
Pretty straightforward, they look a
lot like relations and of
course, they will translate directly to relations.
Next let's talk about associations.
Associations capture relationships between objects of two different classes.
So lets suppose again that
we have our student class and
I won't write the attributes now,
I'll just write it like that
and we have our college class
in our UML design.
If we want to have a
relationship that students apply
to colleges, we write that
just as a line between
the students and the college classes
and then we give it a name.
So we'll call it applied
and that says that we have
objects in the student class and
objects that are in the college class
that are associated with each
other through the applied association.
If we want to introduce a
directionality to the relationship,
so to say that student are
applying to colleges, we can
put in a arrow there,
that's part of the UML language
although we'll see that it doesn't
really make much difference when we
end up translating UML designs to relations.
When we have associations between classes,
we can specify what we call
the multiplicity of those and
that talks about how many objects
of one class can be related
to an object of another class.
So we'll see that we
can capture concepts like one-one
and many-one and so forth.
So let's look specifically at
how we specify those in
a UML diagram, and for
now I'll just use two generic classes.
So let's say I have a
class C1 and I
have a class C2, and let's
say that I have an association
between those two classes, so that would be a line.
And I could give that a name,
let's call it A. Let's say
that I want to specify that
each object in Class C,
well I'm just going to write those
objects kind of as dots here below the class specification.
Let's say that I
wanted to say that each one
of those is going to
be related to at least
M but at most
N objects in class
C2, so here are class C2 objects.
I'm going to have this kind of fan out in my relationship.
To specify that in the
UML diagram I write that as M..
and on the right side
of the association line and
again that's say each object
then in C1, then will related
to between M and N objects of C2.
Now there are some special cases in this notation.
I can write M dot dot
star, and star means
any number of objects, so
what that would see is
that each object in "C1"
is related to atleast "M"
and, as many as it wants, elements of "C2".
I can also write zero to
end and that will
say that each object in C1
is related to possibly none
for example here we have one
that I haven't draw any relations tips.
Possibly none and up to N elements of C2.
I can also write zero dot
dot star, and that's basic
no restrictions on the multiplicity.
And just to mention,
the default, actually, is one dot dot one.
So if we don't write anything
on our association we're
assuming that each object is
related to exactly one object
of the other class and that's in
both directions by the way,
so I can put a X..
Y here and now we'll
restrict how many objects of
element of C2 is related to.
Incidentally UML allow some abbreviations, 1..1
can be abbreviated as a just
plain old one and 0..
can be abbreviated with just star.
So let's take a look at
our student and college example and
what the multiplicity of the association
of students applying to colleges might be.
So let's suppose that we
insist that students must apply
somewhere, so they apply to at
least one college but they're
not allow to apply to more
than 5 and further more
lets say that no college will
take more than 20,000 applications, so
this example is contrived to
allow me to put multiplicity specifications on both sides.
So again, we'll have our
student class and we'll
have our college class
and we'll have our association
between the student and the
college class, and I'll just write the name underneath here.
Now applied.
So lets think about how
to specify our multiplicities for this.
So to specify that a student
must apply somewhere but cannot
apply to more than 5
colleges, we put a one
dot dot five on this side.
It really takes some thinking sometimes
to remember which side to put the specification on.
But that's what gives us the
fan out from the objects
on the left to the objects on the right.
So it says each student can
apply to up to five
colleges and must apply
to at least one, so we
won't have any who haven't applied anywhere.
On the other side, we want
to talk about how many students
can have applied to a particular
college, and we said it can be no more than 20,000.
We didn't put a lower
restriction on that, so we
would specify that as 0 to 20,000.
So I mentioned earlier that multiplicity
of associations captures some of
these types of relationships you might
have learned about somewhere else called
one to one, many to one, and so on.
So, let me show the relationship
between association multiplicity and this terminology.
So if we have a one-to-one relationship
between "C1" and "C2," technically one-to-one
doesn't mean everything has to be involved.
What it really means is that
each object on each side
is related to at most one on the other side.
So to say it's a one-to-one relationship
we would put a "zero, dot,
dot, one" on both sides.
Let's see if I can use some colors here.
So what about many-to-one?
Many-to-one says that we can have
many elements of "C1" related
to an element of "C2," but
each element of "C2" will
be related to, at most, one element of "C1."
So in that case we still
have a "zero, dot, dot, one"
on the right side indicating that
each "C1" object is related
to at most one object of
"C2" but we have
