Database Study Seven: Database Control Functions

📢 This article was translated by gemini-2.5-flash

SQL Series
Database Study One: Introduction to Databases: https://blog.yexca.net/en/archives/86
Database Study Two: Relational Model: https://blog.yexca.net/en/archives/87
Database Study Three: SQL Language: https://blog.yexca.net/en/archives/88
Database Study Four: Relational Database Theory: https://blog.yexca.net/en/archives/89
Database Study Five: Normal Forms: https://blog.yexca.net/en/archives/90
Database Study Six: Database Design: https://blog.yexca.net/en/archives/91
Database Study Seven: Database Control Functions: This Article

Transaction Management

A transaction is a sequence of operations that are “all or nothing.”

Transactions and programs are distinct concepts. Typically, a program can include multiple transactions.

In SQL, these three statements define transactions:

1
2
3
4
5
6
-- Start transaction
BEGIN TRANSACTION
-- Commit transaction
COMMIT
-- Rollback transaction
ROLLBACK

Transactions have ACID properties: Atomicity, Consistency, Isolation, and Durability.

1 Atomicity

Transactions are atomic; they’re all or nothing.

2 Consistency

A transaction’s execution must ensure the database transitions from one consistent state to another.

Thus, a database is consistent when it only contains results from successfully committed transactions.

3 Isolation

Transactions are isolated from each other.

When multiple transactions run concurrently, one transaction’s updates are invisible to others until it successfully commits.

4 Durability

Once a transaction successfully commits, its updates to the database are permanent, even if the database crashes.

Database Backup and Recovery

During database operation, hardware and software failures are inevitable. These can affect data correctness, even corrupting the database and causing partial or complete data loss. Hence, a key database technique is creating redundant data, i.e., backups. Database recovery technology focuses on promptly restoring the database to its correct pre-failure state after a system fault.

Failure Types

4 types of failures in databases:

1 Internal Transaction Failure

Uses REDO and UNDO techniques.

2 System Failure (DBMS)

System abruptly halts, requiring a restart.

3 Media Failure

Hardware damage.

4 Computer Virus

Backup Methods

The basic principle of recovery is “establishing data redundancy” (duplicate storage). Methods for creating redundant data include data dumping and logging. Data dumping is categorized into static/dynamic dumps, full/incremental dumps, and log files.

1 Static and Dynamic Dumps

Static dumps mean no access or modification operations are allowed on the database during the dump.

Dynamic dumps allow database access and modification during the dump.

So, dumps and user transactions can execute concurrently.

2 Full and Incremental Dumps

Full dumps involve dumping all data each time.

Incremental dumps only dump data updated since the last dump.

3 Log Files

During transaction processing, the DBMS (Database Management System) writes every operation (transaction start, end, inserts, deletes, and modifications) to a log file. Should a failure occur, the DBMS recovery subsystem uses the log file to undo transaction changes, reverting to the transaction’s initial state. Thus, the DBMS uses log files for transaction and system failure recovery, and aids backup copies in media failure recovery.

Recovery

Transaction recovery involves 3 steps:

Scan the log file in reverse (from end to beginning) to find the transaction’s update operations.
Perform inverse operations for the transaction’s updates.
Continue scanning the log file in reverse for other transaction updates, applying the same process until the transaction’s start flag.

That is, if a transaction abnormally terminates, all previous operations are undone.

System Failure Recovery

UNDO incomplete transactions, REDO lost transactions.

Media Failure Recovery

Reload the database, REDO completed transactions.

Concurrency Control

Issues with Concurrent Operations

1 Lost Updates

My changes don’t stick. For example, I change the website logo, but it reverts to the old one.

2 Non-Repeatable Reads

Within a single transaction, I read different data twice in a row. Say I initially read a bank balance of $10. Then, my company deposits $100. The second read becomes $110.

3 Dirty Reads

I want to read fields A and B, but another user is updating them as I read. If I read halfway through their update (A is updated, B is being updated), the data I read is ‘dirty data’.

Concurrency Control Techniques

The primary technique for concurrency control is locking. Basic lock types include exclusive locks (X-locks or write locks) and shared locks (S-locks or read locks).

1 Exclusive Lock (X-Lock)

If transaction T acquires an X-lock on data object A, only T can read and modify A. No other transaction can place any type of lock on A until T releases its lock.

2 Shared Lock (S-Lock)

If transaction T acquires an S-lock on data object A, only T can read A, but not modify it. Other transactions can only place S-locks on A until T releases its S-lock. This ensures other transactions can read A, but cannot modify it until T releases its S-lock.

Locking Protocols

Level 1 Locking Protocol (Solves Lost Updates)

When modifying, an X-lock must be acquired until completion.

Level 2 Locking Protocol (Solves Dirty Reads)

When reading, an S-lock is acquired and released immediately after use.

Level 3 Locking Protocol (Solves Non-Repeatable Reads)

When reading, an S-lock is acquired and held until completion.

Serializability

Serializability: If multiple concurrent executions yield a result that matches any possible serial execution, then it’s serializable.

Transaction T1: Read B, A=B+1, Write A Transaction T2: Read A, B=A+1, Write B

If T1 runs first, then T2, the result is A=4, B=3; If T2 runs first, then T1, the result is A=3, B=4. This concurrent execution satisfies serializability.

Distributed Databases

Fragmentation Transparency

Users or applications don’t need to know how a logically accessed table is physically fragmented and stored.

Replication Transparency

With replication, users don’t need to know which nodes data is replicated to, or how.

Location Transparency

Users don’t need to know the physical location where data is stored.

Logical Transparency

Users or applications don’t need to know which data model is used at a local site.

Data Sharing

Data stored across different nodes is shared.

Autonomy

Each node can manage its local data independently.

Availability

If one site fails, the system can use replicas from other sites, preventing a total system shutdown.

Distribution

Data is stored across different sites.