Database Learning 6: Database Design

📢 This article was translated by gemini-3-flash-preview

SQL Series
Database Learning 1: Introduction to Databases: https://blog.yexca.net/en/archives/86
Database Learning 2: Relational Model: https://blog.yexca.net/en/archives/87
Database Learning 3: SQL Language: https://blog.yexca.net/en/archives/88
Database Learning 4: Relational Database Theory: https://blog.yexca.net/en/archives/89
Database Learning 5: Normalization: https://blog.yexca.net/en/archives/90
Database Learning 6: Database Design: This article
Database Learning 7: Database Control Functions: https://blog.yexca.net/en/archives/92

Database Design Strategies

Top-down and bottom-up; usually a combination of both is used.

Database Design Steps

The “New Orleans Method” is currently the most recognized and authoritative method for database design. It divides the process into four main phases:

User Requirement Analysis: Collect requirements and determine system boundaries.
Conceptual Design: Describe the conceptual model, typically using the E-R method.
Logical Design: Convert E-R diagrams into relational schemas and normalize them.
Physical Design: Determine the specific implementation of the logical model on a computer.

This is followed by the database implementation phase and the operation/maintenance phase.

Requirement Analysis Phase

Requirement analysis happens after the project is confirmed. Users and designers organize and describe the content (data) and functions (behavior) the database system will involve. It views the system from the user’s perspective.

This process is the foundation for everything that follows. Logical design, physical design, and application design are all based on this phase.

Tasks in the Requirement Analysis Phase:

Investigate the organization → Familiarize with business activities → Clarify user requirements → Determine system boundaries → Analyze system functions → Analyze system data → Write the analysis report.

Documents in the Requirement Analysis Phase:

Data collected during investigation might be fragmented. Analysts and designers must further analyze and express user needs by creating Requirement Specifications, Data Dictionaries, and Data Flow Diagrams (DFD).

E-R Model

The E-R (Entity-Relationship) model is a conceptual model used to describe the real world.

Entity

Represented by rectangles. The entity name is written inside the rectangle.

Relationship

Represented by diamonds. The relationship name is written inside the diamond, connected to related entities with undirected edges. The relationship type (1:1, 1:n, m:n) is marked next to the edges.

<1> Relationships between two different entities:

One-to-One (1:1): An entity in entity set E₁ is associated with at most one entity in entity set E₂.
One-to-Many (1:n): An entity in entity set E₁ can be associated with multiple entities in entity set E₂.
Many-to-Many (m:n): Multiple entities in entity set E₁ can be associated with multiple entities in entity set E₂.

<2> Relationships between more than two entity sets:

There can be 1:1:1, 1:1:n, 1:m:n, and r:n:m relationships between three or more different entity sets.

<3> Binary relationships within the same entity set:

1:1, 1:n, and m:n relationships can also exist between entities within the same entity set.

Attribute

Attributes are characteristics of an entity, represented by ovals. The attribute name is written inside (primary attributes are underlined) and connected to the entity with an undirected edge.

Each attribute has a value range (domain). In the same entity set, the attributes and their domains are the same for each entity, but they may take different values.

Attributes in the E-R model are categorized as follows:

Simple and Composite Attributes

Simple attributes are atomic and cannot be subdivided. Composite attributes can be broken down into smaller parts (i.e., other attributes). Unless specified, “attribute” usually refers to a simple one.

Single-valued and Multi-valued Attributes

A single-valued attribute has only one value for a specific entity. A multi-valued attribute can have multiple values for a single entity.

NULL Attributes

When an entity has no value for an attribute or the value is unknown, a NULL value is used (meaning “not applicable” or “unknown”).

Derived Attributes

Derived attributes are calculated from other attributes. For example, if an employee has Hire Date and Years of Service, the Years of Service can be derived from the difference between the current date and the Hire Date.

Weak Entity

In the real world, some relationships represent total dependency. For example, the relationship between an employee and their dependents—a dependent always belongs to a specific employee. These entities depend heavily on another entity; their existence depends on the existence of another. These are called weak entities.

In E-R diagrams, weak entities are represented by double-lined rectangles.

Superclass and Subclass Entities

For example, an “Employee” entity could be a superclass (parent class) for “Pilot,” “Mechanic,” and “Administrator” entities. There is an inheritance relationship between the superclass and subclasses.

Subclass rectangles have vertical lines on the sides, and the line connecting the superclass and subclass contains a circle.

Conceptual Structure Design Phase

Establishing a conceptual model using the E-R method.

This involves classifying, aggregating, and summarizing data from the requirement analysis phase to determine entities, attributes, and relationships. The specific steps include selecting local applications, designing individual local E-R diagrams, and merging them.

Requirement Analysis → Abstract Data → Design Local Views → Merge & Resolve Conflicts → Refine/Remove Redundancy → Logical Structure Design Phase.

Select Local Applications

Choose appropriate levels of the Data Flow Diagram (DFD) so each part corresponds to a local application implementing a specific function.

Design Individual Local E-R Diagrams

After defining local applications, design a local E-R diagram for each one.

Merge E-R Diagrams

Once local E-R diagrams are ready, merge them into a single global E-R diagram. The goal is to resolve conflicts between local diagrams.

There are three main types of conflicts between local E-R diagrams:

Attribute Conflicts

The same attribute might exist in different local diagrams. Different designers might use different data types, value ranges, or units. These must be unified for computer storage.

Naming Conflicts

The same concept might have different names (synonyms), or the same name might represent different concepts (homonyms) in different diagrams. These must be unified.

Structural Conflicts

An entity might have different attributes in different diagrams, or an object might be abstracted as an entity in one diagram and an attribute in another. These need to be consistent.

Logical Structure Design Phase

Logical structure design creates a data model based on the conceptual design. This could be hierarchical, network, or relational. This section covers mapping a global E-R diagram to a relational model. Key steps include determining the data model, converting E-R diagrams, defining integrity constraints (entity, referential, and user-defined), and determining user views.

Conceptual Design → Convert to Data Model → Normalize Relations → Schema Optimization → Design User Subschemas → Physical Design Phase.

Converting E-R Diagrams to Relational Schemas

The global conceptual model describes the information world and isn’t directly suited for computer processing. To work with a RDBMS, the E-R diagram must be converted. While E-R diagrams have entities, attributes, and relationships, relational models only have one structure: the relation (table).

1. Converting Entities to Relations

Each entity becomes a relation. The entity name is the relation name, entity attributes become relation attributes, and the entity identifier becomes the primary key.

2. Converting Relationships to Relations

One-to-One (1:1) Relationship

One way is to merge the relationship into either of the two entities. Add the primary key of the other entity and any attributes of the relationship to the chosen entity’s attributes. The primary key of the merged entity remains the same.

Another way (rarely used) is to create a separate relation. Its name is the relationship name, attributes include the primary keys of both entities plus relationship attributes. The primary key can be the primary key of either entity.

One-to-Many (1:n) Relationship

One way is to merge the relationship into the “many” side. Add the primary key of the “one” side and any relationship attributes to the “many” side entity. The primary key of the “many” side remains the same.

Another way (rarely used) is to create a separate relation. Attributes include the primary keys of both entities plus relationship attributes. The primary key is the primary key of the “many” side entity.

Many-to-Many (m:n) Relationship

Must be converted into an independent relation. The name is the relationship name. Attributes include the primary keys of both entities plus relationship attributes. The primary key is the combination of the primary keys of both entities.

Many-to-Many-to-Many (n-ary) Relationship

Essentially the same as m:n.

Superclass and Subclass Conversion

Both superclass and subclass entities can be converted into relations. The primary key of the superclass is added to the subclass relation.

Normalizing Relational Schemas

Determine data dependencies based on semantics.
Determine the normal form of the schemas based on those dependencies.
If a schema doesn’t meet requirements, decompose it using decomposition algorithms to reach 3NF, BCNF, or 4NF.
Evaluate and refine the schemas.

Physical Design Phase

Logical Design → Determine Physical Database Schema → Evaluate Physical Schema → Implementation Phase.

Database Implementation and Maintenance

Physical Design → Define Database → Load Data & Write Applications → Trial Run → Operation and Maintenance Phase.