Constraints on Relations

Relational Algebra as a Constraint Language

Section outline

Introduces two equivalent ways to express constraints using relational algebra:
1. Empty Set Constraint: An expression’s result must be empty, denoted as $R = \emptyset$. This asserts that no tuples satisfy a certain condition.
1. Set-Containment Constraint: Every tuple in one expression’s result must also be in another’s, denoted as $R \subseteq S$.
Explains that the two forms are equivalent, as $R \subseteq S$ is the same as stating that the set difference $R - S$ is empty ($R - S = \emptyset$).

# 1
read: Section 2.5: Constraints on Relations (section)
model: pro
rules: short presentation

Summarize into a short paragraph.

Relational algebra provides a formal mechanism for specifying constraints on relations. A constraint is typically expressed in one of two equivalent forms.

An algebraic expression $E$ can be asserted to be empty, written as $E = \emptyset$. This states that no tuples satisfying the conditions of $E$ may exist in the database.
Alternatively, a set-containment constraint $E_1 \subseteq E_2$ can be asserted, which requires every tuple in the result of $E_1$ to also be in the result of $E_2$. The two forms are logically equivalent, since $E_1 \subseteq E_2$ can be rewritten as $E_1 - E_2 = \emptyset$.

This framework is sufficiently powerful to express many common types of database integrity constraints.

Referential Integrity: Ensures that a value appearing in one attribute of a relation also appears in an attribute of another relation, commonly expressed as $\pi_A(R) \subseteq \pi_B(S)$.
Key Constraints: Stipulates that a set of attributes uniquely identifies a tuple. This is formulated by constructing an expression that finds any two distinct tuples agreeing on the key attributes; this expression must evaluate to the empty set.
Domain and Semantic Constraints: Enforce business rules or domain restrictions, such as limiting the possible values for an attribute or relating values across different attributes and relations.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-02 16:22:30
Duration: 19.371565103530884
Cummulative cost: $2.15e-02

Referential Integrity Constraints

Section outline

Defines a referential integrity constraint, which asserts that a value appearing in one context (e.g., as a foreign key) must also appear in another, related context (e.g., as a primary key).
Shows how to express this constraint in relational algebra using projection and set containment, for example: $\pi_A(R) \subseteq \pi_B(S)$, which is equivalent to $\pi_A(R) - \pi_B(S) = \emptyset$.

# 2
read: Section 2.5.1: Relational Algebra as a Constraint Language (section)
model: pro
rules: presentation bullet definition math

First, define what a reference is (aka foreign key).  Use formal mathematical expressions.
Then, give an example for referential constraint in practice.
Then, discuss how relational algebra is used to define foreign key constraint.

Definition: Referential Integrity Constraint

A referential integrity constraint, often called a foreign key, dictates that the values in a column or a set of columns in one relation must also appear in a column or set of columns in another (or the same) relation. This ensures that a relationship between two tables remains synchronized.

Formally, let $R$ and $S$ be relations. Let $L_R = (A_1, A_2, \ldots, A_n)$ be a list of attributes from the schema of $R$, and $L_S = (B_1, B_2, \ldots, B_n)$ be a corresponding list of attributes from the schema of $S$. The referential integrity constraint from $L_R$ of $R$ to $L_S$ of $S$ is the requirement that:

\[ \forall t_R\in R,\ \exists t_S\in S,\text{ such that } t_R[L_R] = t_S[L_S] \]

This asserts that for every tuple in $R$, its projection onto the attributes $L_R$ must also exist in $S$ as a projection onto the attributes $L_S$.

Example of a Referential Integrity Constraint

Consider the relations for a movie database: Movies(title, year, length, genre, studioName, producerC#) MovieExec(name, address, cert#, netWorth)

A reasonable constraint is that the producer of any movie must be a known movie executive. This means every value for producerC# in the Movies relation must also be a value for cert# in the MovieExec relation.
Expressing Constraints in Relational Algebra

As stated in Section 2.5.1, constraints in relational algebra can be expressed in two primary, equivalent forms. We apply these forms to the referential integrity constraint from the preceding example.
The Set-Containment Form

This form directly expresses the subset relationship from the definition of the constraint. The set of all producer certificate numbers from Movies must be a subset of all executive certificate numbers from MovieExec.

\[ \pi_{\text{producerC\#}}(\text{Movies}) \subseteq \pi_{\text{cert\#}}(\text{MovieExec}) \]
The Empty-Set Form

This form asserts that an expression evaluating to the set of “violating” tuples must be empty. A constraint of the form $E_1 \subseteq E_2$ is equivalent to the constraint $E_1 - E_2 = \emptyset$. This form is useful because it expresses the constraint as a condition that must not be met.

\[ \pi_{\text{producerC\#}}(\text{Movies}) - \pi_{\text{cert\#}}(\text{MovieExec}) = \emptyset \]

This expression states that if we take all producer certificate numbers and remove those that are known executive certificate numbers, the resulting set must be empty. In other words, there are no “dangling references.”

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-02 16:28:40
Duration: 18.90741801261902
Cummulative cost: $2.40e-02

Key Constraints

Section outline

Describes how to express a key constraint algebraically.
The fundamental idea is that if two tuples in a relation agree on the values of the key attribute(s), they must be the same tuple and therefore agree on all other attributes as well.
This is demonstrated by constructing an expression that finds pairs of tuples with the same key but different non-key attributes; the constraint then asserts that the result of this expression must be empty.

# 4
read: Section 2.5.3: Key Constraints (section)
model: pro
rules: presentation short math definition

Define what a key is using mathematical logic using first order logic without relational algebra.

The describes how to express a key constraint algebraically.
    - The fundamental idea is that if two tuples in a relation agree on the values of the key attribute(s), they must be the same tuple and therefore agree on all other attributes as well.
    - This is demonstrated by constructing an expression that finds pairs of tuples with the same key but different non-key attributes; the constraint then asserts that the result of this expression must be empty.

Finally, give an example.

A key is a set of attributes whose values uniquely identify a tuple in a relation. The formal definition of a key relies on the principle of unique identification.

Definition: Key

Let R be a relation over a set of attributes $\mathcal{A}$. A subset of attributes $K \subseteq \mathcal{A}$ is a key for R if for any two tuples $t_1, t_2 \in R$, if the tuples agree on all attributes in $K$, then they must be the same tuple. Formally, this is expressed as:

\[ \forall t_1 \in R, \forall t_2 \in R, (t_1[K] = t_2[K] \implies t_1 = t_2) \]

This is logically equivalent to stating that no two distinct tuples in R can have the same values for all attributes in the key $K$.

To express a key constraint in relational algebra, we formulate an expression that identifies any tuples violating the constraint. The constraint then asserts that the result of this expression must be the empty set, $\emptyset$. The general strategy is as follows:

Construct all pairs of tuples from the relation by taking the Cartesian product of the relation with itself. To distinguish the attributes of the two tuples in each pair, at least one copy of the relation must be renamed. Let us consider two copies, $R_1$ and $R_2$, of the relation $R$.
From the product $R_1 \times R_2$, select those pairs of tuples that agree on the key attributes but disagree on at least one non-key attribute. Such a pair represents a violation of the key constraint.
The key constraint is enforced by asserting that this set of violating pairs must be empty.

For example, consider the relation MovieStar(name, address, gender, birthdate), where name is the key. The key constraint implies that if two tuples have the same name, they must also have the same address. To express this, we rename MovieStar to MS1 and MS2 and construct a product. We then select pairs that agree on name but disagree on address. The constraint requires this set to be empty.

\[ \sigma_{MS1.name = MS2.name \land MS1.address \neq MS2.address} (\rho_{MS1}(MovieStar) \times \rho_{MS2}(MovieStar)) = \emptyset \]

Here, $\rho_{MS1}(MovieStar)$ and $\rho_{MS2}(MovieStar)$ are shorthand for renaming the MovieStar relation to MS1 and MS2 respectively, which allows us to reference attributes from each copy unambiguously.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-02 16:39:53
Duration: 21.97769069671631
Cummulative cost: $2.92e-02

Additional Constraint Examples

Section outline

Discusses other types of constraints that can be expressed in relational algebra, such as domain constraints that restrict the permitted values for an attribute.
Provides examples for both a simple domain constraint (enumerated values) and a more complex restriction involving a join between two relations.

# 5
read: Section 2.5.4: Additional Constraint Examples (section)
model: pro
rules: presentation short definition math bullet

Describe the list of additional constraints.

For each, use relational algebra to express the constraint.

Additional Constraint Examples

Relational algebra can express a wide variety of constraints beyond simple key or referential integrity constraints.

Domain Constraints

These constraints restrict the values that may appear in an attribute to a specific, enumerated set. For instance, if the course attribute of the Teaches relation must be either ‘CSCI3030U’ or ‘CSCI4050U’, we can express this by stating that the set of tuples with any other value for course must be empty.

\[ \sigma_{\text{course} \ne \text{'CSCI3030U' AND course} \ne \text{'CSCI4050U'}}(\text{Teaches}) = \emptyset \]
Complex Integrity Constraints

More elaborate constraints can involve conditions across multiple relations. For example, consider a requirement that a president of a movie studio must have a net worth of at least $10,000,000. This involves the Studio and MovieExec relations. The constraint asserts that the set of studio presidents with a net worth less than this amount must be empty.

\[ \sigma_{\text{netWorth} < 10,000,000}(\text{Studio} \bowtie_{\text{presC\#}=\text{cert\#}} \text{MovieExec}) = \emptyset \]

An equivalent formulation using set containment states that the set of certificate numbers of studio presidents must be a subset of the certificate numbers of executives with a net worth of at least $10,000,000.

\[ \pi_{\text{presC\#}}(\text{Studio}) \subseteq \pi_{\text{cert\#}}(\sigma_{\text{netWorth} \ge 10,000,000}(\text{MovieExec})) \]

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-02 16:44:59
Duration: 14.509811878204346
Cummulative cost: $1.75e-02

Relational Algebra as a Constraint Language

Referential Integrity Constraints

Example of a Referential Integrity Constraint

Expressing Constraints in Relational Algebra

The Set-Containment Form

The Empty-Set Form

Key Constraints

Additional Constraint Examples

Additional Constraint Examples