Full-Relation Operations
Full-Relation Operations
- This section studies operations that act on relations as a whole, rather than on individual tuples.
- It addresses how SQL handles relations as bags (multisets) rather than sets.
- It discusses forcing results to be sets and preventing duplicate elimination.
- It covers SQL’s support for the grouping and aggregation operator, \(\gamma\), through the
GROUP-BYandHAVINGclauses.
#1
read: Section 6.4: Full-Relation Operations (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- This section studies operations that act on relations as a whole, rather than on individual tuples.
- It addresses how SQL handles relations as bags (multisets) rather than sets.
- It discusses forcing results to be sets and preventing duplicate elimination.
- It covers SQL's support for the grouping and aggregation operator, $\gamma$, through the `GROUP-BY` and `HAVING` clauses.
This section examines operations that act upon relations as a whole, rather than on individual tuples. We will discuss the distinction between the set-based relational model and SQL’s bag-based semantics, and explore SQL’s mechanisms for grouping and aggregation.
Full-Relation Operations
These operations consider the entire collection of tuples within a relation. This contrasts with operations such as joins, which operate on pairs or small sets of tuples. The primary focus is on managing duplicates and performing aggregations over groups of tuples.
Relations as Bags in SQL
In the formal relational model, a relation is a set of tuples, meaning no duplicates are permitted. SQL, however, treats relations as bags (or multisets), where a tuple can appear multiple times. Consequently, the result of a standard
SELECTquery may contain duplicate tuples.Controlling Duplicates
SQL provides mechanisms to control the presence of duplicates in query results.
Duplicate Elimination: To force a query result to be a set, the
DISTINCTkeyword is used followingSELECT. This instructs the system to eliminate all duplicate tuples from the final result. Be advised that this operation is computationally expensive as it typically requires sorting the entire result set.Duplicate Preservation: The standard set operations
UNION,INTERSECT, andEXCEPTeliminate duplicates by default. To preserve duplicates and apply bag semantics, one must append theALLkeyword (e.g.,UNION ALL).
Grouping and Aggregation
SQL implements the grouping-and-aggregation operator, \(\gamma\), from extended relational algebra. This powerful feature allows for the partitioning of tuples into groups and the computation of summary values for each group. This is accomplished through the
GROUP BYclause and aggregate functions.
Let \(R\) be a relation. The grouping and aggregation operation is denoted as \(\gamma_{L}(R)\), where \(L\) is a list of elements that are either: 1. Grouping attributes: Attributes from \(R\). 2. Aggregated attributes: Of the form \(F(A)\), where \(F\) is an aggregation operator (e.g., SUM, AVG, COUNT) and \(A\) is an attribute of \(R\).
The operation first partitions the tuples of \(R\) into groups such that all tuples in a group have the same values for the specified grouping attributes. For each such group, a single result tuple is generated. This tuple contains the common values for the grouping attributes and, for each aggregation \(F(A)\) in \(L\), the result of applying the function \(F\) to the multiset of values of attribute \(A\) within that group.
SQL Clauses for Aggregation
GROUP BYClause: This clause follows theFROMandWHEREclauses and specifies a list of attributes. The relation resulting from theFROMandWHEREclauses is partitioned into groups based on the values of these attributes.HAVINGClause: This clause follows theGROUP BYclause and provides a condition for filtering groups. Whereas theWHEREclause filters individual tuples before grouping, theHAVINGclause filters entire groups after they have been formed, typically using an aggregate function in its condition.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:18:40
- Duration: 19.050811767578125
- Cummulative cost: $2.41e-02
Eliminating Duplicates
- SQL’s relations are bags, meaning they can contain duplicate tuples, unlike the abstract set-based definition of relations.
- By default, SQL queries do not eliminate duplicate tuples from the result.
- The
SELECT DISTINCTkeyword is used to eliminate duplicate tuples and produce a set as the result. - This is analogous to the \(\delta\) operator in relational algebra.
#2
read: Section 6.4.1: Eliminating Duplicates (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- SQL's relations are bags, meaning they can contain duplicate tuples, unlike the abstract set-based definition of relations.
- By default, SQL queries do not eliminate duplicate tuples from the result.
- The `SELECT DISTINCT` keyword is used to eliminate duplicate tuples and produce a set as the result.
- This is analogous to the $\delta$ operator in relational algebra.
SQL relations differ from the abstract notion of relations presented in set theory. A relation in SQL is a bag (or multiset), which means it can contain more than one copy of any given tuple.
In formal relational algebra, a relation is a set of tuples, and by definition, a set cannot have duplicate elements. SQL deviates from this formalism, which has practical implications for query results.
The standard
SELECT-FROM-WHEREquery in SQL does not eliminate duplicates from the result set.The evaluation of a query may produce the same tuple from different source tuples. For instance, a projection that discards certain attributes can cause distinct tuples in the original relation to become identical in the result. By default, SQL will list each of these identical tuples.
To eliminate duplicate tuples from the output of a query, the
SELECTkeyword must be followed by theDISTINCTkeyword.The
SELECT DISTINCTstatement instructs the database system to produce only one copy of any given tuple in the final result, thereby converting the output bag into a set.The
DISTINCTkeyword in SQL is the analog of the duplicate-elimination operator, \(\delta\), from relational algebra.If a query Q produces a relation (bag) \(R\), then the query
SELECT DISTINCT ...produces the set \(\delta(R)\).
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:18:52
- Duration: 11.4144127368927
- Cummulative cost: $1.23e-02
Duplicates in Unions, Intersections, and Differences
- The set operations
UNION,INTERSECT, andEXCEPTnormally eliminate duplicates, treating relations as sets. - To preserve duplicates and perform bag operations, the keyword
ALLmust be used (e.g.,UNION ALL,INTERSECT ALL,EXCEPT ALL). - For bag intersection (
INTERSECT ALL), a tuple appears in the result the minimum number of times it appears in either input relation. - For bag difference (
EXCEPT ALL), a tuple appears in the result \(n - m\) times, where \(n\) and \(m\) are its counts in the first and second relations, respectively, provided \(n > m\).
#3
read: Section 6.4.2: Duplicates in Unions, Intersections, and Differences (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- The set operations `UNION`, `INTERSECT`, and `EXCEPT` normally eliminate duplicates, treating relations as sets.
- To preserve duplicates and perform bag operations, the keyword `ALL` must be used (e.g., `UNION ALL`, `INTERSECT ALL`, `EXCEPT ALL`).
- For bag intersection (`INTERSECT ALL`), a tuple appears in the result the minimum number of times it appears in either input relation.
- For bag difference (`EXCEPT ALL`), a tuple appears in the result $n - m$ times, where $n$ and $m$ are its counts in the first and second relations, respectively, provided $n > m$.
Unlike the SELECT statement, which preserves duplicates by default, the set-like operations in SQL behave differently.
UNION,INTERSECT, andEXCEPTBy default, these operations eliminate duplicate tuples from the result. The relations are treated as sets, and the corresponding set operation is performed.
The
ALLKeyword for Bag OperationsTo prevent the elimination of duplicates and perform operations on bags (multisets), the keyword
ALLmust be appended to the operator, resulting inUNION ALL,INTERSECT ALL, andEXCEPT ALL.Bag Intersection (
INTERSECT ALL)This operation computes the multiset intersection of two relations.
Definition: Bag IntersectionLet \(R\) and \(S\) be two relations (bags). For a given tuple \(t\), let \(count_R(t)\) and \(count_S(t)\) be the number of times \(t\) appears in \(R\) and \(S\), respectively. The number of times \(t\) appears in the result of
R INTERSECT ALL Sis \(min(count_R(t), count_S(t))\).Bag Difference (
EXCEPT ALL)This operation computes the multiset difference between two relations.
Definition: Bag DifferenceLet \(R\) and \(S\) be two relations (bags). For a given tuple \(t\), let \(count_R(t)\) and \(count_S(t)\) be the number of times \(t\) appears in \(R\) and \(S\), respectively. The number of times \(t\) appears in the result of
R EXCEPT ALL Sis \(count_R(t) - count_S(t)\), provided that this difference is positive. If \(count_R(t) \le count_S(t)\), the tuple \(t\) does not appear in the result.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:19:06
- Duration: 13.569606065750122
- Cummulative cost: $1.60e-02
Grouping and Aggregation in SQL
- Introduces the grouping-and-aggregation operator \(\gamma\) from extended relational algebra.
- This operator partitions a relation’s tuples into groups based on specified attributes.
#4
read: Section 6.4.3: Grouping and Aggregation in SQL (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- Introduces the grouping-and-aggregation operator $\gamma$ from extended relational algebra.
- This operator partitions a relation's tuples into groups based on specified attributes.
SQL supports grouping and aggregation, operations analogous to the \(\gamma\) operator from extended relational algebra.
This operator enables the partitioning of tuples within a relation into distinct groups. This is a fundamental operation for summarizing data.
The partitioning of tuples is based on the values of one or more specified attributes.
All tuples that share identical values for the designated grouping attributes are collected into a single group. Subsequently, aggregation functions can be applied to each group to compute summary statistics.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:19:14
- Duration: 8.260210990905762
- Cummulative cost: $9.69e-03
Aggregation Operators
- SQL provides five aggregation operators:
SUM,AVG,MIN,MAX, andCOUNT. COUNT(*)counts all tuples in a relation or group.- The
DISTINCTkeyword can be used within aggregation functions (e.g.,COUNT(DISTINCT attribute)) to operate on unique values only.
#5
read: Section 6.4.4: Aggregation Operators (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- SQL provides five aggregation operators: `SUM`, `AVG`, `MIN`, `MAX`, and `COUNT`.
- `COUNT(*)` counts all tuples in a relation or group.
- The `DISTINCT` keyword can be used within aggregation functions (e.g., `COUNT(DISTINCT attribute)`) to operate on unique values only.
An aggregation operator is a function \(f\) that takes a multiset (or bag) of values \(B\) as input and returns a single scalar value. In SQL, these operators are applied to the values of an attribute within a relation or within groups of a relation.
SQL provides five standard aggregation operators.
The operators are
SUM,AVG,MIN,MAX, andCOUNT. They are applied to a scalar-valued expression, typically a column name, within aSELECTclause. By default, these operators consider all non-NULL values in the specified column, including duplicates.The
COUNT(*)operator is a special case for counting tuples.The expression
COUNT(*)counts all tuples in the relation that results from theFROMandWHEREclauses of a query. This differs fromCOUNT(attribute), which counts the number of non-NULL values in a specific attribute.The
DISTINCTkeyword can be used to operate on unique values.To eliminate duplicates before applying the aggregation, the keyword
DISTINCTmay be used before the attribute name. For example,COUNT(DISTINCT attribute)counts the number of unique non-NULL values in the specified attribute. While syntactically valid for other operators (e.g.,SUM(DISTINCT attribute)), its application is less common.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:19:26
- Duration: 12.270047187805176
- Cummulative cost: $1.46e-02
Grouping
- The
GROUP BYclause is used to group tuples based on the values of one or more attributes. - When
GROUP BYis used, theSELECTclause can only contain aggregation functions or attributes listed in theGROUP BYclause. - Describes the logical evaluation order:
FROM/WHEREclauses are evaluated first to create an intermediate relation, which is then grouped by theGROUP BYclause, and finally theSELECTlist is computed for each group.
#6
read: Section 6.4.5: Grouping (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- The `GROUP BY` clause is used to group tuples based on the values of one or more attributes.
- When `GROUP BY` is used, the `SELECT` clause can only contain aggregation functions or attributes listed in the `GROUP BY` clause.
- Describes the logical evaluation order: `FROM`/`WHERE` clauses are evaluated first to create an intermediate relation, which is then grouped by the `GROUP BY` clause, and finally the `SELECT` list is computed for each group.
The
GROUP BYclause is used to group tuples based on the values of one or more attributes.It follows the
WHEREclause and is succeeded by a list of grouping attributes. The query result will contain one row for each distinct combination of values in the grouping attributes.When
GROUP BYis used, theSELECTclause is constrained to include only aggregation functions or attributes that are also listed in theGROUP BYclause.This constraint ensures that for each group of tuples, a single, unambiguous value can be produced for each column in the result set. Attributes from the
GROUP BYclause have a single value per group by definition, and aggregate functions compute a single value from the tuples within that group.The logical evaluation of a query with a
GROUP BYclause proceeds in a specific sequence.The conceptual order of operations is as follows:
- First, the relation specified by the
FROMandWHEREclauses is computed. This intermediate relation contains all tuples that satisfy the selection condition. - Next, the tuples of this intermediate relation are partitioned into groups according to the attributes listed in the
GROUP BYclause. - Finally, the aggregations and attributes in the
SELECTclause are computed for each group to generate the final output.
- First, the relation specified by the
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:19:45
- Duration: 18.174846172332764
- Cummulative cost: $1.74e-02
Grouping, Aggregation, and Nulls
- The value
NULLis ignored in aggregations, except forCOUNT(*).COUNT(attribute)only counts non-NULLvalues. NULLis treated as a distinct value when forming groups; there can be a group forNULLvalues.- Aggregation over an empty set of values results in
NULL, except forCOUNT, which returns 0.
#7
read: Section 6.4.6: Grouping, Aggregation, and Nulls (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- The value `NULL` is ignored in aggregations, except for `COUNT(*)`. `COUNT(attribute)` only counts non-`NULL` values.
- `NULL` is treated as a distinct value when forming groups; there can be a group for `NULL` values.
- Aggregation over an empty set of values results in `NULL`, except for `COUNT`, which returns 0.
When performing grouping and aggregation operations in SQL, the handling of NULL values follows a specific set of rules. These rules dictate how nulls contribute to aggregate calculations and group formation.
Let \(g\) be a group of tuples formed by a GROUP BY clause, and let \(B_A\) be the bag of values from attribute \(A\) for the tuples in group \(g\).
Aggregation Functions: For any aggregation function
agg\(\in\) {SUM,AVG,MIN,MAX}, the valueNULLis eliminated from the bag \(B_A\) before the function is computed. The functionCOUNT(A)computes the cardinality of the bag \(B_A\) afterNULLvalues are removed. The functionCOUNT(*)returns the total number of tuples in the group \(g\).Group Formation: The value
NULLis treated as a distinct value when partitioning tuples into groups. All tuples that have aNULLin a grouping attribute are placed into a single group.Empty Bag Aggregation: If the bag of values for an aggregation is empty (either because the group is empty or contains only
NULLs for the aggregated attribute), theCOUNTis 0. All other aggregation functions returnNULL.
NULLvalues are ignored in aggregations.An aggregation function such as
SUM,AVG,MIN, orMAXoperates on a collection of values. AnyNULLs within this collection are discarded before the calculation is performed.COUNT(attribute)returns the number of non-NULLvalues, whereasCOUNT(*)returns the total number of tuples in the group, regardless ofNULLs.NULLis treated as a distinct value for grouping.When a
GROUP BYclause is used, all tuples for which the grouping attribute isNULLare collected into a single group. This group is then treated like any other group formed from non-NULLvalues.Aggregation over an empty set of values yields
NULL, except forCOUNT.If a group contains only
NULLvalues for the attribute being aggregated, the set of values for the aggregation is empty. For such an empty set,COUNTreturns 0. Any other aggregate function, such asSUMorAVG, returnsNULL. For example, computingSUM(B)on a group whose only B-value isNULLresults inNULL, as we are summing an empty bag of values.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:20:06
- Duration: 20.965851068496704
- Cummulative cost: $2.63e-02
HAVING Clauses
- The
HAVINGclause is used to filter groups based on a condition involving aggregates. - It is applied after the
GROUP BYclause has formed the groups. - Rules for the
HAVINGclause are similar to theSELECTclause in a grouped query: any unaggregated attribute must also be a grouping attribute.
#8
read: Section 6.4.7: HAVING Clauses (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- The `HAVING` clause is used to filter groups based on a condition involving aggregates.
- It is applied after the `GROUP BY` clause has formed the groups.
- Rules for the `HAVING` clause are similar to the `SELECT` clause in a grouped query: any unaggregated attribute must also be a grouping attribute.
HAVING Clause
The HAVING clause is a component of a SQL SELECT statement that filters the results of a GROUP BY clause. Whereas the WHERE clause filters individual tuples before grouping and aggregation, the HAVING clause filters the groups themselves based on aggregate properties.
Let \(R\) be the relation produced by the FROM and WHERE clauses. Let the GROUP BY clause partition \(R\) into a set of groups \(G = \{g_1, g_2, \dots, g_k\}\). The HAVING clause contains a predicate \(P\) that is evaluated for each group \(g_i \in G\). The final result of the query is formed from the set of groups \(\{g_i \in G \mid P(g_i) \text{ is true}\}\).
Post-Grouping Filtration
The
HAVINGclause is processed after theGROUP BYclause. It acts as a filter on the groups that have been formed, eliminating those that do not satisfy the specified condition.Conditions on Aggregates
The primary function of the
HAVINGclause is to specify conditions that involve aggregate functions, such asCOUNT,SUM,AVG,MIN, orMAX. This allows for the selection of groups based on a computed property of the group as a whole.Rules for Attribute Reference
The rules governing the use of attributes within a
HAVINGclause are congruent with those for theSELECTclause when aGROUP BYis present.- An attribute may be used within an aggregate function.
- An attribute that appears unaggregated must be one of the grouping attributes listed in the
GROUP BYclause.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-09-20 13:20:34
- Duration: 27.979530096054077
- Cummulative cost: $1.44e-02