Decomposition: The Good, Bad, and Ugly

Section outline

Introduces three desirable properties for a database decomposition: Elimination of Anomalies, Recoverability of Information, and Preservation of Dependencies.

#1

read: Section 3.4: Decomposition: The Good, Bad, and Ugly (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Introduces three desirable properties for a database decomposition: Elimination of Anomalies, Recoverability of Information, and Preservation of Dependencies.

Decomposition of a relation schema, while beneficial for eliminating anomalies, may introduce undesirable consequences. A proper decomposition should be evaluated based on three distinct properties.

Elimination of Anomalies

This is the primary motivation for decomposition, as discussed in Section 3.3. The objective is to restructure a relation schema into a set of smaller schemas, typically in Boyce-Codd Normal Form (BCNF), such that redundancy is minimized and insertion, deletion, and update anomalies are resolved. This property represents the “good” aspect of decomposition.
Recoverability of Information

A decomposition must be reversible. We must be able to recover the original relation instance precisely from the decomposed relations. This is achieved through a natural join. This property is also known as the lossless-join property. A decomposition of a relation $R$ into relations with schemas $R_1, R_2, \ldots, R_k$ is lossless if, for any valid instance $r$ of $R$, the following holds: \[ r = \pi_{R_1}(r) \bowtie \pi_{R_2}(r) \bowtie \dots \bowtie \pi_{R_k}(r) \] A decomposition that is not lossless may result in the generation of spurious tuples or the loss of original tuples upon rejoining. This represents a potentially “bad” outcome of decomposition.
Preservation of Dependencies

It is desirable that the functional dependencies (FDs) specified on the original relation can be enforced by checking them on the smaller, decomposed relations. If checking an FD requires joining two or more of the decomposed relations, that dependency is not preserved by the decomposition. The failure to preserve dependencies complicates integrity checking and is considered an “ugly” consequence of decomposition, as it may necessitate a trade-off between achieving BCNF and efficiently enforcing all business rules.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-09 10:14:37
Duration: 20.392406940460205
Cummulative cost: $1.96e-02

Recovering Information from a Decomposition

Section outline

Defines a lossless join, where the original relation can be perfectly reconstructed from the decomposed relations.
Explains that decompositions based on a BCNF-violating functional dependency (FD) are guaranteed to be lossless.
Provides the general condition for a lossless binary decomposition: if a relation $R$ on attributes $X \cup Y \cup Z$ is decomposed into projections on $X \cup Y$ and $Y \cup Z$, the join is lossless if the FD $Y \rightarrow Z$ (or $Y \rightarrow X$) holds in $R$.
Demonstrates how a decomposition without a supporting FD can lead to a lossy join, producing ‘bogus tuples’ not present in the original relation.

#2

read: Section 3.4.1: Recovering Information from a Decomposition (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Defines a lossless join, where the original relation can be perfectly reconstructed from the decomposed relations.
- Explains that decompositions based on a BCNF-violating functional dependency (FD) are guaranteed to be lossless.
- Provides the general condition for a lossless binary decomposition: if a relation $R$ on attributes $X \cup Y \cup Z$ is decomposed into projections on $X \cup Y$ and $Y \cup Z$, the join is lossless if the FD $Y \rightarrow Z$ (or $Y \rightarrow X$) holds in $R$.
- Demonstrates how a decomposition without a supporting FD can lead to a lossy join, producing 'bogus tuples' not present in the original relation.

A decomposition of a relation must allow for the complete and exact reconstruction of the original relation instance from its decomposed parts. This property is known as a lossless join.

Definition: Lossless Join

A decomposition of a relation $R$ into a set of relations with attribute sets $\{S_1, S_2, \dots, S_k\}$ is said to have a lossless join if for every instance of $R$ that satisfies the given functional dependencies, the original relation can be recovered by the natural join of its projections. \[R = \pi_{S_1}(R) \bowtie \pi_{S_2}(R) \bowtie \dots \bowtie \pi_{S_k}(R)\] If this equality does not hold, the decomposition is termed lossy.

Decompositions performed according to the BCNF decomposition algorithm are guaranteed to produce a lossless join.

The algorithm decomposes a relation $R$ based on a BCNF-violating functional dependency (FD) $X \rightarrow Y$. The resulting schemas are $R_1(X \cup Y)$ and $R_2(R - Y)$. This specific method of decomposition preserves the original information, ensuring that $R = R_1 \bowtie R_2$.
The condition for a lossless binary decomposition is formally established by Heath’s theorem.

If a relation schema $R(U)$ is decomposed into two schemas $R_1(S_1)$ and $R_2(S_2)$, where $U = S_1 \cup S_2$, the join is lossless if and only if the functional dependency $(S_1 \cap S_2) \rightarrow (S_1 - S_2)$ or $(S_1 \cap S_2) \rightarrow (S_2 - S_1)$ holds for $R$. For a relation $R(A, B, C)$ decomposed into $R_1(A, B)$ and $R_2(B, C)$, the join is lossless if $B \rightarrow A$ or $B \rightarrow C$ holds.
A decomposition that does not satisfy the lossless join condition can result in the generation of spurious tuples.

When the natural join is performed on projections of a relation for which the lossless join condition does not hold, the resulting relation may contain “bogus” tuples—tuples that were not in the original relation instance. This represents a corruption of the original data. For example, if $R(A,B,C)$ is decomposed into $\pi_{A,B}(R)$ and $\pi_{B,C}(R)$ but neither $B \rightarrow A$ nor $B \rightarrow C$ holds, joining the projections will produce a superset of the original tuples in $R$.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-09 10:14:55
Duration: 17.818452835083008
Cummulative cost: $2.09e-02

The Chase Test for Lossless Join

Section outline

Introduces the chase test, an algorithm to determine if any decomposition is lossless given a set of FDs.
Describes how to construct an initial tableau representing the decomposition, with unsubscripted variables for the target tuple and subscripted variables for unknown values.
Explains the process of ‘chasing’ the tableau by applying FDs to equate symbols when rows agree on the left-hand side of an FD.
States the condition for a lossless join: the chase process must result in at least one row becoming all unsubscripted variables.

#3

read: Section 3.4.2: The Chase Test for Lossless Join (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Introduces the chase test, an algorithm to determine if any decomposition is lossless given a set of FDs.
- Describes how to construct an initial tableau representing the decomposition, with unsubscripted variables for the target tuple and subscripted variables for unknown values.
- Explains the process of 'chasing' the tableau by applying FDs to equate symbols when rows agree on the left-hand side of an FD.
- States the condition for a lossless join: the chase process must result in at least one row becoming all unsubscripted variables.

The Chase Test for Lossless Join

The chase test is an algorithm to determine whether a decomposition of a relation schema has a lossless join with respect to a given set of functional dependencies (FDs). It works for any number of decomposed relations.

Definition: The Chase Test

Let a relation schema $R$ with attributes $U$ be decomposed into schemas $S_1, S_2, \ldots, S_k$. Let $F$ be a set of FDs that hold on $R$. The chase test determines if this decomposition is lossless.

Initialization: Create a tableau with one column for each attribute in $U$ and one row for each schema $S_i$. For row $i$ and attribute $A_j \in U$:

If $A_j \in S_i$, the entry is an unsubscripted variable $a_j$. For example, we will have $a, b, c, d, \dots$

If $A_j \notin S_i$, the entry is a unique subscripted variable $b_{ij}$.

Chase Step: Repeatedly apply the FDs in $F$. For each FD $X \to Y$ in $F$, if two rows in the tableau agree on all attributes in $X$, then force them to agree on all attributes in $Y$.

If equating an unsubscripted symbol $a_j$ with a subscripted symbol $b_{ij}$, change all occurrences of $b_{ij}$ to $a_j$.

If equating two subscripted symbols, change all occurrences of one to the other.

Termination: The process ends when no more changes can be made. The decomposition is lossless if and only if at least one row in the final tableau consists entirely of unsubscripted variables.

Constructing the Tableau

The initial tableau represents the state of a tuple $(a, b, c, \ldots)$ after being projected onto the decomposed relations and before being joined. Each row corresponds to a tuple from the original relation that, when projected, yields the corresponding tuple in the decomposed relation. The subscripted variables represent unknown values that were lost during projection.
Applying Functional Dependencies

The chase process uses the FDs to infer equalities among the variables in the tableau. If two rows agree on the left-hand side of an FD, they must also agree on the right-hand side. This step models how FDs constrain the possible tuples in the original relation. By equating symbols, we deduce information about the unknown (subscripted) values.
The Condition for a Lossless Join

If the chase results in a row with only unsubscripted variables, it proves that any tuple formed by the natural join of the projections must have existed in the original relation. Specifically, it demonstrates that one of the original tuples that contributed to the join must be identical to the joined tuple itself. If no such row can be formed, a counterexample relation can be constructed from the final tableau, showing that the join can produce spurious tuples.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-09 10:15:23
Duration: 28.11644196510315
Cummulative cost: $2.15e-02

An example

The chase test is an algorithm to determine whether a decomposition is lossless. The procedure involves constructing a matrix, called a tableau, and applying the given functional dependencies to equate symbols within it. If the process results in a row of all unsubscripted symbols, the join is lossless.

We shall illustrate with an example. Consider a relation $R(A, B, C, D)$ with the set of functional dependencies $F = \{A \rightarrow B, B \rightarrow C, CD \rightarrow A\}$. Suppose $R$ is decomposed into three relations with schemas $S_1 = \{A, D\}$, $S_2 = \{A, C\}$, and $S_3 = \{B, C, D\}$.

Step 1: Construct the initial tableau.

For each relation schema $S_i$ in the decomposition, create a row. The columns of the tableau correspond to the attributes of the original relation $R$. For row $i$, place an unsubscripted variable in the columns for attributes in $S_i$ and a unique, subscripted variable in all other columns.

A B C D

a $b_1$ $c_1$ d

a $b_2$ c $d_2$

$a_3$ b c d
Step 2: Apply the functional dependencies to equate symbols.
We “chase” the dependencies. If two rows agree on the left side of an FD, we force them to agree on the right side.
1. Apply $A \rightarrow B$: Rows 1 and 2 both have ‘a’ in the $A$ column. We must equate their $B$ values. We set $b_1 = b_2$ and replace $b_2$ with $b_1$.
  
  A B C D
  
  a $b_1$ $c_1$ d
  
  a $b_1$ c $d_2$
  
  $a_3$ b c d
2. Apply $B \rightarrow C$: Now, rows 1 and 2 agree on the $B$ column (both have $b_1$). We equate their $C$ values, setting $c_1 = c$. Since ‘c’ is unsubscripted, we replace $c_1$ with ‘c’.
  
  A B C D
  
  a $b_1$ c d
  
  a $b_1$ c $d_2$
  
  $a_3$ b c d
3. Apply $CD \rightarrow A$: After the prior step, rows 1 and 3 agree on columns $C$ and $D$. We equate their $A$ values, setting $a_3 = a$. We replace $a_3$ with ‘a’.
  
  A B C D
  
  a $b_1$ c d
  
  a $b_1$ c $d_2$
  
  a b c d
Step 3: Determine the outcome.

The chase terminates, as no further changes can be made. We inspect the final tableau. The third row consists entirely of unsubscripted symbols (a, b, c, d).

The presence of such a row confirms that the decomposition has a lossless join.

A	B	C	D
a	\(b_1\)	\(c_1\)	d
a	\(b_2\)	c	\(d_2\)
\(a_3\)	b	c	d

A	B	C	D
a	\(b_1\)	\(c_1\)	d
a	\(b_1\)	c	\(d_2\)
\(a_3\)	b	c	d

A	B	C	D
a	\(b_1\)	c	d
a	\(b_1\)	c	\(d_2\)
\(a_3\)	b	c	d

A	B	C	D
a	\(b_1\)	c	d
a	\(b_1\)	c	\(d_2\)
a	b	c	d

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-09 11:03:13
Duration: 25.395267963409424
Cummulative cost: $2.97e-02

Why the Chase Works

Section outline

Justifies the validity of the chase test.
Explains that a successful chase proves that any tuple produced by the join must have been a projection from one of the original tuples in relation R.
Explains that a failed chase, where no row becomes fully unsubscripted, means the final tableau itself can serve as a counterexample relation that satisfies the FDs but whose join produces a bogus tuple.

#4

read: Section 3.4.3: Why the Chase Works (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Justifies the validity of the chase test.
- Explains that a successful chase proves that any tuple produced by the join must have been a projection from one of the original tuples in relation R.
- Explains that a failed chase, where no row becomes fully unsubscripted, means the final tableau itself can serve as a counterexample relation that satisfies the FDs but whose join produces a bogus tuple.

# ex1

model: pro
rules: short presentation bullet

Write a step-by-step example of a chase test for lossless join.
Use markdown table.

Justification of the Chase Test

Justification: Validity of the Chase Test

The chase algorithm provides a definitive test for whether a decomposition has a lossless join. Its validity rests on its ability to either prove that any tuple generated by the join must have existed in the original relation, or to construct a counterexample.

Successful Chase: A Lossless Join

When the chase process concludes by producing a row with only unsubscripted variables, it has formally proven that the decomposition is lossless. The chase acts as a proof by deduction.

The process begins with a tableau representing tuples $t_1, t_2, \dots, t_k$ from a relation instance $R$. These are the tuples whose projections would join to form a new tuple $t$. By applying the given functional dependencies, the chase forces variables to be equal. If a row, say row $i$, becomes fully unsubscripted, it demonstrates that the original tuple $t_i \in R$ must be identical to the generated tuple $t$. Since any tuple $t$ formed by the join must be in $R$, the join is lossless.
Failed Chase: A Lossy Join

If the chase algorithm terminates and no row consists entirely of unsubscripted variables, the join is not lossless. The final tableau itself serves as a counterexample.

Consider the final tableau as a relation instance. This relation satisfies all the given functional dependencies, because if it did not, a dependency could have been applied to equate more variables, and the chase would have continued.

Yet, if we project this tableau onto the decomposed schemas and then perform a natural join, the result will contain the tuple of all unsubscripted variables. This tuple, however, does not exist in the tableau we constructed. This demonstrates a specific instance of a relation that satisfies the dependencies but whose join produces a tuple not present in the original instance. Therefore, the join is lossy.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-09 10:15:47
Duration: 23.292654991149902
Cummulative cost: $2.19e-02

Dependency Preservation

Section outline

Introduces the concept of dependency preservation, where all original FDs can be checked by examining the decomposed relations individually.
Highlights the potential tradeoff between achieving BCNF and preserving all dependencies, noting that it is not always possible to satisfy both properties simultaneously.

#5

read: Section 3.4.4: Dependency Preservation (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Introduces the concept of dependency preservation, where all original FDs can be checked by examining the decomposed relations individually.
- Highlights the potential tradeoff between achieving BCNF and preserving all dependencies, noting that it is not always possible to satisfy both properties simultaneously.

Dependency Preservation

When we decompose a relation schema, we desire that the resulting collection of schemas preserves the original functional dependencies. This property, known as dependency preservation, ensures that we can enforce the integrity constraints of the original relation by checking the functional dependencies on each of the individual decomposed relations. This avoids the need to perform a costly natural join of the decomposed relations simply to validate an update.
Formal Definition

Definition: Dependency-Preserving Decomposition

Let $R$ be a relation schema with a set of functional dependencies $F$. A decomposition of $R$ into a set of schemas $\{R_1, R_2, \dots, R_k\}$ is dependency-preserving if the closure of the union of the projections of $F$ onto each $R_i$ is equivalent to the closure of $F$.

Let $F_i = \pi_{R_i}(F)$ be the set of FDs in $F^+$ that can be tested in relation $R_i$. The decomposition preserves dependencies if: \[ (\bigcup_{i=1}^{k} F_i)^+ = F^+ \]
The Tradeoff with BCNF
A decomposition into Boyce-Codd Normal Form (BCNF) guarantees a lossless join and the elimination of certain anomalies. However, it is not always possible to achieve a BCNF decomposition that also preserves all dependencies. In such cases, a tradeoff must be made.
Consider a relation Bookings(title, theater, city) with the FDs:
- theater -> city
- title city -> theater
The FD theater -> city is a BCNF violation because theater is not a superkey. Decomposing based on this violation yields two relations:
- $R_1(\text{theater, city})$
- $R_2(\text{title, theater})$
This decomposition is in BCNF and has a lossless join. However, the FD title city -> theater is lost. It cannot be checked on either $R_1$ or $R_2$ individually, as neither relation contains all three attributes title, city, and theater. To enforce this dependency, one would have to join the relations, defeating one of the main purposes of the decomposition.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-09 10:16:04
Duration: 17.00215721130371
Cummulative cost: $1.71e-02