Database Study Five: Normal Forms

📢 This article was translated by gemini-2.5-flash

SQL Series
Database Study One: Database Introduction: https://blog.yexca.net/en/archives/86
Database Study Two: Relational Model: https://blog.yexca.net/en/archives/87
Database Study Three: SQL Language: https://blog.yexca.net/en/archives/88
Database Study Four: Relational Database Theory: https://blog.yexca.net/en/archives/89
Database Study Five: Normal Forms: This Article
Database Study Six: Database Design: https://blog.yexca.net/en/archives/91
Database Study Seven: Database Control Functions: https://blog.yexca.net/en/archives/92

Relations in a relational database must meet certain normalization requirements. Normal forms are used to measure different degrees of normalization. Currently, there are 6 main normal forms: 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF.

The minimum requirement is 1NF. Meeting additional requirements beyond 1NF leads to 2NF, and so on. Among these, 3NF and BCNF are the most crucial and are the primary goals for normalization.

A relational schema in a lower normal form can be converted into a set of relational schemas in a higher normal form through schema decomposition. This process is called normalization.

First Normal Form (1NF)

Definition: Let R be a relational schema. R is in First Normal Form (1NF) if and only if the domain of every attribute A in R contains only atomic (indivisible) values.

1NF cannot eliminate issues like data redundancy, update (insertion, deletion) anomalies, or complex modifications, as partial functional dependencies might still exist.

Second Normal Form (2NF)

Definition: Let R be a relational schema. R is in Second Normal Form (2NF) if and only if it is in 1NF, and every non-key attribute is fully functionally dependent on a candidate key.

Relational schemas in 2NF might still suffer from data redundancy and update anomalies due to the possible existence of transitive functional dependencies.

Third Normal Form (3NF)

Definition: Let R be a relational schema. R is in Third Normal Form (3NF) if and only if it is in 2NF, and every non-key attribute is non-transitively dependent on a candidate key.

Relational schemas in 3NF might still have partial and transitive dependencies involving prime attributes (attributes that are part of any candidate key).

Boyce-Codd Normal Form (BCNF)

Definition: Let R be a relational schema, and F be its set of dependencies. R is in BCNF if and only if R is in 3NF, and for every non-trivial functional dependency X → Y in F, X is a superkey of R.

From the definition of BCNF, we can conclude that a relational schema satisfying BCNF has the following properties:

All non-prime attributes are fully functionally dependent on every key.
All prime attributes are also fully functionally dependent on every key that does not contain them.
No attribute is fully functionally dependent on any set of attributes that is not a key.

A relational schema R satisfying BCNF has eliminated insertion and deletion anomalies.

A relational schema with only two attributes is always in BCNF.

Fourth Normal Form (4NF)

Definition: A relational schema R<U, F> is in 1NF. It is in Fourth Normal Form (4NF) if, for every non-trivial multi-valued dependency X →→ Y (Y ⊈ X) in R, X is a superkey of R.

Multi-valued Dependency

Given a relational schema R<U, F>, where X, Y, Z are subsets of U, and Z = U - X - Y. A multi-valued dependency X →→ Y holds in R if and only if, for any relation r of R, given a pair of (x, z) values, there exists a set of Y values that depends solely on the x value and is independent of the z value.

Trivial Multi-valued Dependency

If X →→ Y holds in the above definition of multi-valued dependency, and Z is an empty set, then X →→ Y is called a trivial multi-valued dependency.

Normal Form Summary

1NF (First Normal Form)

All field values are atomic (indivisible).

2NF (Second Normal Form)

No partial functional dependency of non-prime attributes on a key. A table should only store one type of data, not multiple types in the same table.

3NF (Third Normal Form)

No transitive functional dependency of non-prime attributes on a key. Ensures every column in the table directly relates to the primary key, not indirectly.

BCNF (Boyce-Codd Normal Form)

Eliminates transitive dependencies of any attribute on a candidate key. BCNF is a refined 3NF.

1NF → 2NF: Eliminates partial functional dependencies of non-prime attributes on a key.

2NF → 3NF: Eliminates transitive functional dependencies of non-prime attributes on a key.

3NF → BCNF: Eliminates partial and transitive functional dependencies of prime attributes on a key.

BCNF → 4NF: Eliminates non-trivial, non-functional multi-valued dependencies.

A relation where all candidate keys are single attributes is inherently in 2NF.

Relational Schema Decomposition

For relational schemas with data redundancy, insertion anomalies, or deletion anomalies, decomposition into multiple relational schemas is the solution. Accordingly, data originally stored in one 2D table will be distributed across multiple tables. For this decomposition to be meaningful, a minimum requirement is that the latter must not lose information from the former.

New issues arise during decomposition. To ensure the decomposed schema retains the properties of the original schema, the fundamental requirements are that the schema decomposition must have the lossless join property and dependency preservation.

Lossless Join

Lossless join means that when a relational schema is decomposed, any valid instance of the original relational schema should be recoverable through natural join operations after decomposition.

In other words, the decomposed relations can be reconstructed into the original relation via natural join.

Dependency Preservation

This refers to whether F’ (the union of functional dependencies in the sub-schemas after decomposition) is equivalent to F (the original set of functional dependencies). If F’ contains all functional dependencies in F, then dependency preservation is achieved (a sufficient condition).

In other words, dependencies remain unchanged after decomposition.

Reference: Schema Decomposition Dependency Preservation Judgment - Database Exam Review_guoyp2126’s blog-CSDN blog

Given R (ABCDEGH) and F={A->D, E->D, D->B, BC->D, DC->A}, find a dependency-preserving 3NF decomposition.

Find the minimal functional dependency set.
Identify attributes not in F and group them separately.
Group dependencies with the same left-hand side. If there’s no match, convert A->D to {AD}. If there are matches, like {A->B, A->C}, then convert to {ABC}.
If a candidate key is not present in the separated groups, make any one candidate key a separate group.

Minimal functional dependency set Fmin={A->D, E->D, D->B, BC->D, DC->A}
GH is not in F, group {GH} separately.
Candidate key: CE
AE

{AD}{ED}{DB}{BCD}{DCA}{CE}{GH}