Subqueries

Section outline

A subquery is a query that is part of another query, used to help in its evaluation.
Subqueries can return a single constant (scalar value) for comparison in a WHERE clause.
Subqueries can return relations that are used in WHERE clauses with operators like EXISTS, IN, ALL, and ANY.
Subqueries can appear in FROM clauses, acting as derived relations.

#1

read: Section 6.3: Subqueries (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- A subquery is a query that is part of another query, used to help in its evaluation.
- Subqueries can return a single constant (scalar value) for comparison in a `WHERE` clause.
- Subqueries can return relations that are used in `WHERE` clauses with operators like `EXISTS`, `IN`, `ALL`, and `ANY`.
- Subqueries can appear in `FROM` clauses, acting as derived relations.

In SQL, a query can be nested within another to facilitate complex data retrieval. Such a nested query is termed a subquery.

Definition: Subquery

A subquery is a SELECT-FROM-WHERE expression that is contained within another SQL query. The outer query is known as the main query. The result of the subquery is used by the main query to complete its evaluation.

Subqueries can be utilized in several distinct ways within a main query.

Scalar Subqueries

A subquery can be formulated to return a single atomic value, known as a scalar. This subquery must produce a relation with exactly one column and one row. The resulting single value can then be used in a WHERE clause for direct comparison, much like a constant. For instance, it can be compared with an attribute’s value in the main query. A run-time error occurs if the subquery returns zero tuples or more than one tuple.
Subqueries in WHERE Clause Conditions
Subqueries can return a relation (typically a single column) that is used as an operand for certain relational operators in the WHERE clause. These operators evaluate to a boolean value.
- EXISTS R: True if the relation R returned by the subquery is not empty.
- s IN R: True if the value s is equal to one of the values in the unary relation R.
- s op ALL R: True if the comparison s op v is true for every value v in the unary relation R. The op can be any standard comparison operator (e.g., =, >, <).
- s op ANY R (or s op SOME R): True if the comparison s op v is true for at least one value v in the unary relation R.
Subqueries in the FROM Clause

A subquery can be placed in the FROM clause of a main query. The result of this subquery is treated as a temporary, or derived, relation. This derived relation must be given an alias (a tuple variable) so that it can be referenced in other parts of the main query, such as the SELECT list or the WHERE clause. This allows for the construction of intermediate results that can then be joined with other relations.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:03:44
Duration: 28.977590799331665
Cummulative cost: $1.69e-02

Subqueries that Produce Scalar Values

Section outline

Explains how a select-from-where expression that returns a single value (a scalar) can be used like a constant, typically for comparison in a WHERE clause.

#2

read: Section 6.3.1: Subqueries that Produce Scalar Values (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Explains how a select-from-where expression that returns a single value (a scalar) can be used like a constant, typically for comparison in a `WHERE` clause.

A subquery can be formulated to return a single atomic value, known as a scalar. Such a query is constrained to produce a relation with a single attribute and a single tuple.

Definition: Scalar Subquery

A scalar subquery is a SELECT statement, enclosed in parentheses, that returns a result set containing exactly one column and one row. The value in that single cell is the result of the subquery.
The primary utility of a scalar subquery is its ability to be used in expressions where a constant value is expected. It is most frequently found in the WHERE clause to provide a value for comparison.
Consider the task of finding the producer of the film ‘Star Wars’. We require two relations: Movies and MovieExec. The producerC# for ‘Star Wars’ can be obtained from the Movies table. This value can then be used to find the producer’s name in the MovieExec table.
```
SELECT name
FROM MovieExec
WHERE cert# = (SELECT producerC#
               FROM Movies
               WHERE title = 'Star Wars');
```
The subquery (SELECT producerC# ...) is evaluated first. Its result, a single certificate number, is then used by the outer query’s WHERE clause, effectively resolving the outer query to:
```
SELECT name
FROM MovieExec
WHERE cert# = 12345; -- Assuming 12345 is the result
```
A critical constraint is that a scalar subquery must return precisely one value. If the subquery returns zero tuples or more than one tuple, a run-time error will occur. The database system cannot resolve the comparison in the outer query’s WHERE clause under such conditions.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:03:58
Duration: 14.000209093093872
Cummulative cost: $1.62e-02

Conditions Involving Relations

Section outline

Introduces SQL operators that can be applied to a relation (expressed as a subquery) to produce a boolean result.
Defines EXISTS R, which is true if relation R is not empty.
Defines s IN R, which is true if scalar s is equal to one of the values in the unary relation R.
Defines s > ALL R and s > ANY R for comparing a scalar s with all or at least one value in a unary relation R, applicable with all six comparison operators.

#3

read: Section 6.3.2: Conditions Involving Relations (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Introduces SQL operators that can be applied to a relation (expressed as a subquery) to produce a boolean result.
- Defines `EXISTS R`, which is true if relation `R` is not empty.
- Defines `s IN R`, which is true if scalar `s` is equal to one of the values in the unary relation `R`.
- Defines `s > ALL R` and `s > ANY R` for comparing a scalar `s` with all or at least one value in a unary relation `R`, applicable with all six comparison operators.

SQL provides several operators that can be applied to a relation, which must be expressed as a subquery, to produce a boolean result. These operators are fundamental for formulating complex conditions within a WHERE clause.

EXISTS

This operator tests whether a relation contains any tuples. It is used to check for the existence of rows in a subquery’s result.

Definition: EXISTS

The condition EXISTS R is true if and only if the relation R is not empty. \[ \text{EXISTS } R \iff R \neq \emptyset \] The negation, NOT EXISTS R, is true if and only if $R$ is empty.
IN

This operator determines if a given value matches any value in a set of values returned by a subquery. The subquery must return a single column.

Definition: IN

For a scalar value $s$ and a unary relation $R$, the condition s IN R is true if and only if $s$ is equal to one of the values in $R$. \[ s \text{ IN } R \iff s \in R \] The condition s NOT IN R is the negation. The expression s = ANY R is equivalent to s IN R.
ALL

This operator compares a scalar value with every value in a single-column set returned by a subquery. It is used with a standard comparison operator.

Definition: ALL

For a scalar value $s$, a unary relation $R$, and a comparison operator $\theta \in \{=, \neq, <, \leq, >, \geq\}$, the condition s θ ALL R is true if and only if the comparison $s \ \theta \ t$ is true for every value $t$ in $R$. \[ s \ \theta \text{ ALL } R \iff \forall t \in R (s \ \theta \ t) \] For instance, s > ALL R means that $s$ is greater than every value in $R$.
ANY or SOME

This operator compares a scalar value with at least one value in a single-column set returned by a subquery. It is also used with a standard comparison operator.

Definition: ANY / SOME

For a scalar value $s$, a unary relation $R$, and a comparison operator $\theta$, the condition s θ ANY R is true if and only if there exists at least one value $t$ in $R$ for which the comparison $s \ \theta \ t$ is true. SOME is a synonym for ANY. \[ s \ \theta \text{ ANY } R \iff \exists t \in R (s \ \theta \ t) \] For example, s > ANY R means that $s$ is greater than at least one value in $R$.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:04:20
Duration: 21.665996074676514
Cummulative cost: $2.49e-02

Conditions Involving Tuples

Section outline

Describes the representation of tuples in SQL as a parenthesized list of scalar values, such as (123, 'foo').
Extends conditions like IN and ANY to operate on tuples, comparing a tuple t with the tuples in a relation R.

#4

read: Section 6.3.3: Conditions Involving Tuples (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Describes the representation of tuples in SQL as a parenthesized list of scalar values, such as `(123, 'foo')`.
- Extends conditions like `IN` and `ANY` to operate on tuples, comparing a tuple `t` with the tuples in a relation `R`.

In SQL, a tuple is represented by a parenthesized list of scalar values.

The components of such a list can be constants, such as (123, 'foo'), or attributes, such as (name, address, netWorth). A mixture of constants and attributes is also permitted.
Relational comparison operators can be generalized to operate on tuples.

If a tuple t has the same number of components as the tuples in a relation R, it is syntactically valid to form expressions such as t IN R or t <> ANY R. The comparison between the tuple t and the tuples of R proceeds component by component, based on the assumed standard order of attributes for the relation R.

Definition: Tuple Membership (IN)

Let $t = (v_1, v_2, \dots, v_n)$ be a tuple of $n$ scalar values. Let $R$ be a relation of arity $n$. The condition $t \text{ IN } R$ is true if and only if there exists a tuple $r \in R$ such that $r = (v_1, v_2, \dots, v_n)$.

Nested subqueries can leverage tuple comparisons to link relations on multiple attributes simultaneously.

Consider the query to find the producers of movies in which ‘Harrison Ford’ starred. The query links the Movies relation to the StarsIn relation using a composite key (title, year).
```
SELECT name
FROM MovieExec
WHERE cert# IN
    (SELECT producerC#
     FROM Movies
     WHERE (title, year) IN
         (SELECT movieTitle, movieYear
          FROM StarsIn
          WHERE starName = 'Harrison Ford'
         )
    );
```
In this structure, the middle subquery’s WHERE clause evaluates to true for a Movies tuple if its (title, year) pair is present in the set of title-year pairs returned by the innermost subquery.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:04:36
Duration: 16.760929107666016
Cummulative cost: $1.89e-02

Correlated Subqueries

Section outline

Defines a correlated subquery as one that depends on a value from a tuple variable in an outer query, requiring the subquery to be evaluated for each tuple of the outer query.
Explains the scoping rules for names in nested queries and the necessity of using aliases (tuple variables) to distinguish between attributes from different relations, especially when a relation is used in both the outer and inner query.

#5

read: Section 6.3.4: Correlated Subqueries (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Defines a correlated subquery as one that depends on a value from a tuple variable in an outer query, requiring the subquery to be evaluated for each tuple of the outer query.
- Explains the scoping rules for names in nested queries and the necessity of using aliases (tuple variables) to distinguish between attributes from different relations, especially when a relation is used in both the outer and inner query.

A subquery may depend on a tuple variable that is defined in an outer query. Such a subquery is termed a “correlated subquery” because its result is correlated with the rows of the outer query.

Definition: Correlated Subquery

A correlated subquery is a nested query that uses values from the outer query in its WHERE clause. Consequently, the subquery cannot be evaluated once and for all. Instead, it must be re-evaluated for each tuple processed by the outer query.

Evaluation of Correlated Subqueries

Unlike a simple subquery that is executed once, a correlated subquery is executed repeatedly, once for each candidate tuple of the outer query. The outer query provides a value that is then used to evaluate the inner query.
Scoping Rules for Attributes

SQL employs lexical scoping rules to resolve attribute names within nested queries. An attribute reference is first resolved against the relations in the FROM clause of its own query block. If no match is found, the search proceeds to the FROM clause of the immediately enclosing query block, and so on, moving outwards through the nested structure.
Use of Aliases (Tuple Variables)

When a subquery and an outer query refer to the same relation, an alias, or tuple variable, is required to distinguish between the tuples of the two different instances of the relation. An attribute is prefixed with its tuple variable and a dot to explicitly associate it with the correct instance of the relation, thereby resolving ambiguity. For example, Old.title refers to the title attribute of the tuple Old from the outer query’s relation.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:04:49
Duration: 12.08627986907959
Cummulative cost: $1.43e-02

Subqueries in FROM Clauses

Section outline

Demonstrates the use of a parenthesized subquery as a relation in a FROM clause.
Notes that a subquery used in the FROM clause must be given a tuple-variable alias.

#6

read: Section 6.3.5: Subqueries in FROM Clauses (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Demonstrates the use of a parenthesized subquery as a relation in a `FROM` clause.
- Notes that a subquery used in the `FROM` clause must be given a tuple-variable alias.

In SQL, a subquery can be used as a relation within the FROM clause. This allows for the creation of temporary, derived relations that can be referenced in the outer query, much like a stored table.

Definition: Subquery in FROM Clause

A subquery may appear in a FROM clause in place of a relation name. Such a subquery must be enclosed in parentheses and must be given a tuple-variable alias. The result of the subquery is treated as a relation whose tuples are represented by this alias.

A parenthesized subquery can be used as a relation in a FROM list.

The result of the subquery forms a temporary relation that exists only for the duration of the outer query’s execution. This is particularly useful for breaking down complex queries into more manageable, logical steps.
The subquery must be assigned a tuple-variable alias.

Since the relation generated by the subquery is not a stored table and has no intrinsic name, an alias is syntactically mandatory. This alias is used to refer to the tuples of the subquery’s result.
Tuples in the result of the subquery are referenced through its alias.

Consider a subquery with alias R that produces a column named A. This column would be referenced in the outer query as R.A.

Example

Let us find the producers of Harrison Ford’s movies. We can construct a subquery that first finds the certificate numbers of these producers and then join this result with the MovieExec relation.

SELECT name
FROM MovieExec, (SELECT producerC#
                 FROM Movies, StarsIn
                 WHERE title = movieTitle AND
                       year = movieYear AND
                       starName = 'Harrison Ford'
                ) AS Prod
WHERE cert# = Prod.producerC#;

The subquery in the FROM clause first joins Movies and StarsIn to find all movies starring ‘Harrison Ford’ and selects their producers’ certificate numbers (producerC#). This resulting set of certificate numbers is given the alias Prod. The outer query then joins the MovieExec relation with this temporary Prod relation on the certificate number to retrieve the names of the producers.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:05:03
Duration: 13.806691884994507
Cummulative cost: $1.40e-02

SQL Join Expressions

Section outline

Introduces explicit join expressions which produce relations and can be used as subqueries.
Describes CROSS JOIN as a synonym for Cartesian product.
Explains theta-joins using the JOIN...ON syntax, where the product of two relations is filtered by a specified condition.

#7

read: Section 6.3.6: SQL Join Expressions (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Introduces explicit join expressions which produce relations and can be used as subqueries.
- Describes `CROSS JOIN` as a synonym for Cartesian product.
- Explains theta-joins using the `JOIN...ON` syntax, where the product of two relations is filtered by a specified condition.

SQL provides explicit join expressions to construct relations. These expressions can stand as queries by themselves or be used as subqueries within the FROM clause of a SELECT-FROM-WHERE statement. They often serve as shorthands for more elaborate queries.

CROSS JOIN
This operation is a synonym for the Cartesian product. The expression R CROSS JOIN S computes the product of relations R and S, denoted in relational algebra as $R \times S$. The resulting relation contains all possible pairings of a tuple from R and a tuple from S.
```
Movies CROSS JOIN StarsIn;
```
The schema of the result includes all attributes of both Movies and StarsIn.
JOIN ... ON
This syntax constructs a theta-join. A theta-join is a product of two relations followed by a selection based on a specified condition. The meaning of R JOIN S ON C is equivalent to applying a selection $\sigma_C$ to the product $R \times S$.
```
Movies JOIN StarsIn ON
  Movies.title = StarsIn.movieTitle AND
  Movies.year = StarsIn.movieYear;
```
The result is a relation where each tuple is formed by combining a tuple from Movies and a tuple from StarsIn that satisfy the condition C specified in the ON clause.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:05:14
Duration: 10.927731037139893
Cummulative cost: $1.31e-02

Natural Joins

Section outline

Defines the NATURAL JOIN operator in SQL, which behaves like the relational algebra operator.
The join condition equates all pairs of attributes with a common name, and one of each pair is projected out of the result.

#8

read: Section 6.3.7: Natural Joins (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Defines the `NATURAL JOIN` operator in SQL, which behaves like the relational algebra operator.
- The join condition equates all pairs of attributes with a common name, and one of each pair is projected out of the result.

The SQL NATURAL JOIN operator provides a direct implementation of the natural join from relational algebra.

Implicit Join Condition

The join condition is implicitly established by equating all pairs of attributes from the two relations that share a common name. There are no other conditions in the join predicate.
Automatic Projection

From each pair of equated attributes, one is projected out of the final result. This ensures that the resulting relation does not contain duplicate columns for the join attributes.

Definition: Natural Join

Let $R$ and $S$ be two relations. Let the set of attribute names common to both schemas be $\{A_1, A_2, \dots, A_k\}$. The natural join of $R$ and $S$, written in SQL as R NATURAL JOIN S, is equivalent to the relational algebra expression $R \bowtie S$.

Formally, it is defined as a projection of a selection on the Cartesian product:

\[ R \bowtie S = \pi_{\text{schema}(R) \cup \text{schema}(S)} (\sigma_{R.A_1 = S.A_1 \land \dots \land R.A_k = S.A_k}(R \times S)) \]

The resulting schema consists of all attributes from $R$ and all attributes from $S$, with the common attributes appearing only once.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:05:28
Duration: 13.812438249588013
Cummulative cost: $1.59e-02

Outerjoins

Section outline

Introduces outerjoins to augment the result of a join with dangling tuples (those that do not match any tuple in the other relation), padded with NULL values.
Explains FULL OUTER JOIN which pads and includes dangling tuples from both relations.
Explains LEFT OUTER JOIN and RIGHT OUTER JOIN which include dangling tuples from only the left or right relation, respectively.
Shows that outerjoins can be combined with both NATURAL JOIN and JOIN...ON conditions.

#9

read: Section 6.3.8: Outerjoins (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Introduces outerjoins to augment the result of a join with dangling tuples (those that do not match any tuple in the other relation), padded with `NULL` values.
- Explains `FULL OUTER JOIN` which pads and includes dangling tuples from both relations.
- Explains `LEFT OUTER JOIN` and `RIGHT OUTER JOIN` which include dangling tuples from only the left or right relation, respectively.
- Shows that outerjoins can be combined with both `NATURAL JOIN` and `JOIN...ON` conditions.

The outerjoin operator augments the result of a join by including the dangling tuples, which are those tuples in one relation that do not match any tuple in the other relation according to the join condition. These dangling tuples are padded with NULL values for the attributes of the other relation.

Definition: Outerjoin

Let $R$ and $S$ be relations. The outerjoin extends the standard join operation to retain tuples that do not satisfy the join condition.

The full outerjoin ($R \text{ NATURAL FULL OUTER JOIN } S$) is the set of tuples that are in the natural join of $R$ and $S$, plus tuples for each dangling tuple in $R$ padded with NULLs, plus tuples for each dangling tuple in $S$ padded with NULLs.
The left outerjoin ($R \text{ NATURAL LEFT OUTER JOIN } S$) is the set of tuples that are in the natural join of $R$ and $S$, plus tuples for each dangling tuple in $R$ padded with NULLs.
The right outerjoin ($R \text{ NATURAL RIGHT OUTER JOIN } S$) is the set of tuples that are in the natural join of $R$ and $S$, plus tuples for each dangling tuple in $S$ padded with NULLs.

FULL OUTER JOIN

This operation retains all tuples from both relations. Tuples that match are joined, while dangling tuples from either relation are included in the result, padded with NULL for the attributes of the other relation. For example, MovieStar NATURAL FULL OUTER JOIN MovieExec produces a relation of individuals who are stars, executives, or both.
LEFT OUTER JOIN

This operation retains all tuples from the left-hand relation. Dangling tuples from the left relation are included and padded with NULLs. Dangling tuples from the right relation are discarded. For example, Movies LEFT OUTER JOIN StarsIn ON ... will list all movies, including those for which no stars are listed in StarsIn.
RIGHT OUTER JOIN

This operation is symmetric to the LEFT OUTER JOIN. It retains all tuples from the right-hand relation, padding any dangling tuples from that relation with NULLs, while discarding dangling tuples from the left relation.
Outerjoins with JOIN ... ON

The outerjoin modifiers (FULL, LEFT, RIGHT) can be applied to theta-joins specified with an ON clause, in addition to natural joins. The logic for including and padding dangling tuples is based on the explicit condition provided in the ON clause.

gemini-2.5-pro

Version: 1
Model: gemini-2.5-pro
Created: 2025-09-20 13:05:43
Duration: 15.537811040878296
Cummulative cost: $1.90e-02