Relational Database Theory: Database Learning Part 4

SQL Series
Database Learning 1: Introduction to Databases: https://blog.yexca.net/en/archives/86
Database Learning 2: Relational Model: https://blog.yexca.net/en/archives/87
Database Learning 3: SQL Language: https://blog.yexca.net/en/archives/88
Database Learning 4: Relational Database Theory: This Article
Database Learning 5: Normalization: https://blog.yexca.net/en/archives/90
Database Learning 6: Database Design: https://blog.yexca.net/en/archives/91
Database Learning 7: Database Control Functions: https://blog.yexca.net/en/archives/92

📢 This article was translated by gemini-3-flash-preview

Relational Schema

A relational schema is formally defined as a 5-tuple (including the relation name):

R<U, D, dom, F>

R: Relation name (the symbolic semantic of the tuple).
U: A set of attributes.
D: The domains from which the attributes in U originate.
dom: The mapping from attributes to domains.
F: A set of data dependencies (functional dependencies) on the attribute set U.

Since points 3 and 4 have little impact on schema design, we usually simplify the relational schema to a triple: R<U, F>. A relation $r$ on $U$ is a relation of schema R<U, F> if and only if $r$ satisfies $F$.

Example: R is a grade table, U = (StudentID, Name, CourseID, Grade), F = {StudentID → Name, CourseID → CourseName, (StudentID, CourseID) → Grade}.

Functional Dependency (FD)

Functional dependency is the most fundamental type of data dependency.

Let R(U) be a relational schema on attribute set U, and X, Y be subsets of U. If for every possible relation $r$ of R(U), it is impossible for two tuples to have the same values for attributes in X but different values for attributes in Y, then X functionally determines Y (or Y is functionally dependent on X). This is denoted as X → Y.

Non-trivial Functional Dependency If X → Y but Y ⊈ X, it is a non-trivial FD. Usually, we only care about non-trivial FDs.
Trivial Functional Dependency If X → Y and Y ⊆ X, it is a trivial FD.
Full Functional Dependency In R(U), if X → Y and for any proper subset X’ of X, X’ does not determine Y, then Y is fully functionally dependent on X. Denoted as X -F-> Y.
Partial Functional Dependency If X → Y but Y is not fully functionally dependent on X, it is a partial FD. Denoted as X -P-> Y.
Transitive Dependency In R(U, F), if X → Y, Y ⊈ X, and Y → Z, then Z is transitively dependent on X.

Reference: Untitled Document (pop0726.github.io)

Keys

Let K be a combination of attributes in R(U, F). If K → U and for any proper subset K’ of K, K’ does not determine U, then K is a Candidate Key. If there are multiple candidate keys, pick one to be the Primary Key.

Primary Attribute vs. Non-primary Attribute Attributes contained in any candidate key are primary attributes. Others are non-primary attributes.
Foreign Key If an attribute (or group) X in R(U) is not a key for R but is a key for another relation, X is a foreign key.
Superkey Any set of attributes that can determine all attributes in the relation. A candidate key is a minimal superkey.
All-key A case where all attributes combined form the primary key.

Armstrong’s Axioms

For a relational schema R(U, F), the following inference rules apply:

Reflexivity: If Y ⊆ X ⊆ U, then X → Y is implied by F.
Augmentation: If X → Y is implied by F and Z ⊆ U, then XZ → YZ is implied by F.
Transitivity: If X → Y and Y → Z are implied by F, then X → Z is implied by F.

Derived rules:

Union: If X → Y and X → Z, then X → YZ.
Pseudo-transitivity: If X → Y and WY → Z, then XW → Z.
Decomposition: If X → Y and Z ⊆ Y, then X → Z.

Attribute Closure Calculation

Closure calculation is used to find candidate keys. Here is the logic:

Attributes appearing only on the left side of FDs must be part of the candidate key.
Attributes appearing only on the right side cannot be part of a candidate key.
Attributes appearing on both sides or neither side need to be checked.
Attributes appearing on neither side must be part of the candidate key.
Calculate the closure of the “must-be” attributes. If the closure includes all attributes, they form the candidate key. Otherwise, add other potential attributes one by one and re-calculate the closure.

Example: R<U, F>, U(A, B, C, D, E, G) F = {AB → C, CD → E, E → A, A → G}. Find candidate keys.

Only on left: B, D (Must be in key)
Only on right: G (Cannot be in key)
Both sides: A, C, E
Calculate closure: (BD)⁺: Nothing extra. We need to add attributes from {A, C, E}. (BDA)⁺: → C, E, A, G. Results in ABCDEG. (Candidate Key) (BDC)⁺: → E, A, G. Results in ABCDEG. (Candidate Key) (BDE)⁺: → A, G, C. Results in ABCDEG. (Candidate Key)

Candidate keys: {(BDA), (BDC), (BDE)}

Finding the Minimal Functional Dependency Set

How to find the minimal set?

Decompose the right side (e.g., A → BC becomes A → B and A → C).
Remove redundant dependencies: For each FD, check if the right side can still be determined by the left side using the other remaining FDs.
Minimalize the left side: For FDs with multiple attributes on the left (e.g., BCD → A), check if a subset of the left side (like CD) can determine the right side.

Example: R<U, F> U{A, B, C, D, E, F, G} F = {BCD → A, BC → E, A → F, F → G, C → D, A → G}. Find the minimal set.

Solution:

// (1) Decompose RHS (Already mostly done, but A → G is redundant if we look ahead)

(BCD)⁺ = BCDED…

(BC)⁺ = BCD

(A)⁺ = AG

(F)⁺ = F

(C)⁺ = C

(A)⁺ = AFG (Wait, check if A → G can be derived from A → F and F → G. Yes. So remove A → G).

// (2) Minimalize LHS
In BCD → A, we have C → D. So BCD → A can be simplified to BC → A.

Final Minimal Set:
BC → A
BC → E
A → F
F → G
C → D