Database normalization is a systematic approach to organizing data in a database to reduce redundancy and improve data integrity. The process involves decomposing large tables into smaller, related tables and defining relationships between them. Normalization typically progresses through several forms, each addressing specific types of anomalies and dependencies.
1NF requires that each column in a table contains atomic (indivisible) values and that each column contains only one type of data.
Example: A table with multiple phone numbers in one column is not in 1NF. To convert it, split the phone numbers into separate rows:
| CustomerID | Name | PhoneNumber |
|---|---|---|
| 1 | John Doe | 123-4567 |
| 1 | John Doe | 234-5678 |
| 2 | Jane Smith | 345-6789 |
2NF requires the table to be in 1NF and that all non-key columns are fully functionally dependent on the primary key. This eliminates partial dependencies.
Example: A table that includes customer information within an orders table is not in 2NF. Separate the customer information:
Customers Table:
| CustomerID | Name |
|---|---|
| 1 | John Doe |
| 2 | Jane Smith |
Orders Table:
| OrderID | CustomerID | OrderDate |
|---|---|---|
| 1 | 1 | 2024-05-01 |
| 2 | 2 | 2024-05-02 |
3NF requires the table to be in 2NF and that all the columns are non-transitively dependent on the primary key, eliminating transitive dependencies.
Example: A table where the salesperson’s name depends on the order is not in 3NF. Separate the salesperson information:
Salespersons Table:
| SalespersonID | Name |
|---|---|
| 1 | Alice |
| 2 | Bob |
Orders Table:
| OrderID | CustomerID | OrderDate | SalespersonID |
|---|---|---|---|
| 1 | 1 | 2024-05-01 | 1 |
| 2 | 2 | 2024-05-02 | 2 |
BCNF is a stricter version of 3NF. It requires that for any functional dependency ( X \rightarrow Y ), ( X ) must be a superkey.
Example: A table where rooms depend on courses and instructors violates BCNF. Normalize by creating separate tables for courses and instructors, and another for instructors and rooms.
4NF requires that the table is in 3NF and that it removes multi-valued dependencies, meaning no column should have multiple values independently of another column.
Example: Separate a table with courses, instructors, and students into two tables: one for courses and instructors, and one for courses and students.
5NF requires that the table is in 4NF and eliminates join dependencies, ensuring that data cannot be reconstructed from multiple tables in a way that introduces redundancy.
Example: A table with project parts and suppliers should be split into three tables: one for projects and parts, one for parts and suppliers, and one for projects and suppliers.
Functional dependencies are constraints that describe the relationship between attributes in a table. If attribute ( A ) determines attribute ( B ), then ( B ) is functionally dependent on ( A ).
These occur when one attribute in a table determines another attribute, independent of all other attributes. Resolving these involves separating the data into multiple tables.
Denormalization is the intentional introduction of redundancy into a database to improve read performance, at the expense of increased storage and potential update anomalies.
Example: Combining customer and order details into one table can reduce the need for complex joins during queries, improving read performance.