# BCNF & 3NF

Contents: |

1. Relations |

2. Functional dependencies |

3. Keys |

4. BCNF |

5. Decomposing |

6. 3NF |

7. Why decomposing into 3NF doesn't work |

8. FD basis |

9. Synthesis |

## 1. Relations

Relational databases are so called because they are based on
the mathematical notion of a *relation*, which is simply a set
of tuples. One of the more prominent occurrences of relations
is in the *equivalence relation*.

There are two terminologies used for relational databases.

ProgrammingMathematicaltable relation metadata schema row tuple column attribute

Mathematically, we describe relations using a
**schema**. For example, you can imagine a relation
named ` orders` described as follows.

(orders,customer_name,shipping_addr,product_name,quantity,time_ordered)time_shipped

We'll soon need to make some assumptions about how this relation works.

- Each customer has a distinct name, and each product has a distinct name.
- An order may consist of multiple products;
each product within the order is listed
as an additional tuple within the relation.
An order does not list the same product multiple times —
instead, the
attribute is used to represent this.`quantity` - No customer places two orders at the same time.
- All items in an order are to be shipped to the same address.

## 2. Functional dependencies

Definition:
`A`_{1} `A`_{2} … `A`_{n} → `B`_{1} `B`_{2} … `B`_{m}
is an **FD** (a **functional
dependency**) for a relation `R`
if whenever two tuples in `R` agree on
`A`_{1}, `A`_{2}, … `A`_{n},
they must necessarily agree on
`B`_{1}, `B`_{2}, … `B`_{m}.

Example: For the ` orders` relation with assumptions as
listed above, two FDs are:

`customer_name``→`

`time_ordered`

`shipping_addr`

`customer_name`

`product_name``→`

`time_ordered`

`quantity`

`time_shipped`Since the left side is allowed more information than is necessary, another FD is

`customer_name`

`time_ordered``→`

`quantity`

`shipping_addr``R` is said to **satisfy** an FD, and
`A`_{1}, `A`_{2}, … `A`_{n}
are said to **determine**
`B`_{1}, `B`_{2}, … `B`_{m}.

An FD is **trivial** if
{ `B`_{1}, `B`_{2}, … `B`_{m} }
is a subset of
{ `A`_{1}, `A`_{2}, … `A`_{n} }
For example,
`product_name`` quantity` →

`is a trivial FD.`

`quantity`## 3. Keys

The set { `A`_{1}, `A`_{2}, … `A`_{n} } is a
**superkey** for a relation `R` if it functionally
determines all attributes for a relation. In other words, any
two tuples of `R` must necessarily disagree on
on at least one of
`A`_{1}, `A`_{2}, … `A`_{n}.

{ `A`_{1}, `A`_{2}, … `A`_{n} } is a
**key** for a relation `R` if it is a
“minimal superkey”: That is, the set is a superkey but no
proper subset is a superkey.

Example: For the ` orders` relation,
{

`,`

`customer_name``,`

`product_name``} is a key.`

`time_ordered`## 4. BCNF

A relation `R` is in **Boyce-Codd Normal
Form** (**BCNF**) if for every nontrivial FD
`A`_{1} `A`_{2} … `A`_{n} → `B`_{1} `B`_{2} … `B`_{m}
satisfied by `R`, the set
{ `A`_{1}, `A`_{2}, … `A`_{n} }
is a superkey for
`R`.

Our definition of ` orders` is not in BCNF,
since its FD

`customer_name``→`

`time_ordered``is nontrivial and the left side {`

`shipping_addr``,`

`customer_name``} is not a superkey for`

`time_ordered``.`

`orders`## 5. Decomposing

It's desirable for each database to have all relations in
BCNF. So, given a relation not in BCNF, we would normally attempt to
**decompose** it into two or more BCNF
relations.

This is simple: We find a violating FD (i.e., a nontrivial
FD for
which the left side is not a superkey), expand the right side
to include as many attributes as possible (called
computing the **closure** of the left side),
and then split the relation into:

- one relation containing all attributes of this expanded FD, and
- another relation containing all the attributes of the FD's left side and all attributes that were missing from the FD.

If any of the resulting relations aren't in BCNF, we recurse on it (them).

Take ` orders` as an example.
We already decided that

`customer_name``→`

`time_ordered``is an FD that causes`

`shipping_addr``to violate BCNF. We observe that the right side can be expanded by adding`

`orders``and`

`customer_name``to it; but the remaining attributes (`

`time_ordered``,`

`product_name``,`

`quantity``) don't follow — assuming that an order can contain multiple products, and that these products might be shipped out at varying times. Thus, we'll split into two relations:`

`time_shipped`(`orders`,`customer_name`,`time_ordered`)`shipping_addr`(`order_contents`,`customer_name`,`time_ordered`,`product_name`,`quantity`)`time_shipped`

Both are in BCNF given our presumptions, so there's no more work to do; sometimes, though, one or both will need to be further decomposed.

## 6. Third normal form

Decomposing leads to a potential problem: In some
situations, we can “lose” FDs through splitting them
apart. Let us suppose our ` orders` relation also had a
restriction that no product can be shipped out to two
locations simultaneously; this would lead to the new FD

`product_name``→`

`time_shipped`

`shipping_addr`The FD
`customer_name`` time_ordered` →

`still holds, so we might attempt to decompose the relation as we did. But after splitting the relation as in our example, we would have no way to express our new FD in the resulting relation. Splitting closely related information apart is problematic; this situation leads to a slightly looser normal form called`

`shipping_addr`*third normal form*.

To define third normal form,
we first define an attribute as **prime**
if it is part of some key for its
relation. Given this definition, a
relation is in **3NF** (**third normal
form**) if every
FD satisfies one of these three conditions:

- It is trivial;
- The left side is a superkey;
*or* - Every attribute on the right side either appears on that FD's left side or is prime.

Note that BCNF has stricter restrictions on what FDs it allows, so any relation that is in BCNF is also in 3NF. In practice, well-designed relations are almost always in BCNF; but occasionally a non-BCNF relation is still well-designed (and is in 3NF).

## 7. Why decomposing into 3NF doesn't work

Decomposing a relation into 3NF leads to potential problems. Suppose the following schema.

(students,first_name,hall)sex

Suppose that all first names are either distinctively male or female, and that our campus had only single-sex halls. This leads to two FDs:

→first_namesex

→hallsex

Based on this, we'd conclude that the only key for this
relation is { ` first_name`,

`}. Note that both FDs violate 3NF.`

`hall`Decomposition would propose that we would divide this
relation into two relations based on one of the violating FDs.
Suppose we chose to decompose based on
` hall` →

`.`

`sex`(halls,hall)sex

(students,first_name)hall

Notice that we have again split up an existing FD
(` first_name` →

`). This is exactly what was wrong with BCNF. What have we gained?`

`sex`## 8. FD Basis

There is another algorithm for breaking a non-3NF relation
into 3NF relations. We'll get to that, but first we need to
cover a new topic: the *basis* of a set of FDs.
(For any who've studied Linear Algebra, this usage of the word
*basis* is very similar to its usage in that field.)

A **basis** for a set `F` of FDs is a set `G`
of FDs in which every FD of `F` is implied by the FDs
in `G`. That is, for each `F` FD
`A` → `B`, if we take
the closure of `A` using `G`, we'd get a set
of attributes that includes all attributes of `B`.

For example: In our ` students` example above, suppose we
have the following FDs.

a.→first_namesex

b.→hallsex

c.→first_namehallsex

(FD **c.** is a bit odd: It suggests that people of the
same name must necessarily live in the same hall. When I was an
undergrad there was a rumor that Housing found it fun to assign
students in that way.)

Then the following constitutes a basis.

b.→hallsex

c.→first_namehallsex

We don't need **a.** in our basis, since it's implied by
**c.**.

Given a set `F` of FDs, a **minimal basis**
is a set `G` of FDs for which:

- Every FD in
`G`has just one attribute on its right side. `G`is a basis for`F`.- Removing any FD from
`G`results in a set of FDs that is not a basis for`F`. - For any FD from
`G`, removing an attribute from its left side results in a set of FDs that is not a basis for`F`.

Our basis example from above is not minimal, since one of the FDs has multiple attributes on its right side. We can split it into two pieces.

b.→hallsex

d.→first_namesex

e.→first_namehall

However, this is still not minimal because we can remove
**d.** and still have a basis for our original set of FDs.
The set
{ ` hall` →

`,`

`sex``→`

`first_name``} constitutes a minimal basis.`

`hall`## 9. Synthesis

Here is our algorithm for splitting a non-3NF relation `R` into
3NF relations while not losing any FDs.

- Find a minimal basis for the set of
`R`'s FDs. - For each FD in the minimal basis, create a relation consisting of the attributes in that FD.
- If none of the relations have a set of attributes that
forms a superkey for
`R`, select a key for`R`, and create a relation with the attributes from that key. - If any relation includes only a subset of attributes found in another relation, delete that smaller relation.

For our ` students` example with FDs

**a.**,

**b.**, and

**c.**, we'd get two relations from Step 2:

(halls,hall)sex

(names,first_name)hall

Step 3 would delete no relations. Step 4 would
add no relations, since { ` first_name`,

`} is a superkey of the original relation, and those attributes appear in the`

`hall``relation.`

`names`What would happen if ` students` had

`→`

`first_name``and`

`sex``→`

`hall``as its FD basis? We'd get three relations.`

`sex`(halls,hall)sex

(names,first_name)sex

(students,first_name)hall

I leave it for you to figure out how this came out of the synthesis algorithm.