8.3 Indexes in SQL
Indexes in SQL
#1
read: Section 8.3: Indexes in SQL (section)
model: pro
rules: bullet definition math presentation short
Write a short summary.
An index is an auxiliary data structure associated with a relation that serves to expedite the retrieval of tuples. By organizing pointers to tuples based on the values of one or more attributes, an index allows the database system to locate matching tuples without scanning the entire relation. This mechanism is particularly effective for queries that specify conditions on the indexed attributes and for performing joins.
Purpose of an Index
An index on an attribute \(A\) of a relation \(R\) is a data structure designed to efficiently find tuples in \(R\) that have a specific value for attribute \(A\). It functions as a lookup table, mapping attribute values to the storage locations of the corresponding tuples. This avoids the need for a full scan of the relation, which is inefficient for large tables.
Application in Queries
Indexes are beneficial in two primary scenarios. First, they accelerate the execution of queries with
WHEREclauses that filter tuples based on the indexed attributes. For instance, an index on theyearattribute of aMoviesrelation would allow the system to directly access only the tuples for a given year. Second, indexes can significantly improve the performance of join operations by enabling direct lookups of matching tuples between relations.Types of Indexes
An index can be created on a single attribute or on a combination of attributes, known as a multi-attribute or composite index. For a multi-attribute index on \((A, B)\), the order of attributes is significant. Such an index is most effective for queries that provide a value for attribute \(A\), or for both \(A\) and \(B\).
Declaration and Management
Although not part of the formal SQL standard, most commercial database systems support commands for managing indexes. A typical syntax is
CREATE INDEX index_name ON table_name (attribute_list). An index can be removed using a command such asDROP INDEX index_name.Performance Considerations
The decision to create an index involves a trade-off. While indexes improve the performance of data retrieval (
SELECTstatements), they introduce overhead for data modification operations (INSERT,DELETE,UPDATE), as the index structure must be updated along with the data in the relation.
Let \(R\) be a relation over a set of attributes \(S\). Let \(K \subseteq S\) be a set of attributes designated as the index key. Let \(T\) be the set of tuples in relation \(R\), and let \(loc(t)\) denote the physical storage address of a tuple \(t \in T\).
An index \(I\) on the index key \(K\) is a mapping from the domain of \(K\) to the powerset of tuple locations, defined as:
\[ I(v) = \{ loc(t) \mid t \in T \land t[K] = v \} \]
where \(v\) is a value for the attributes in \(K\). The index provides a data structure that enables the efficient computation of \(I(v)\) for any given \(v\).
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-10-15 23:40:14
- Duration: 19.4617440700531
- Cummulative cost: $2.23e-02
Motivation for Indexes
- Introduces the problem of scanning large relations to find a few tuples matching a condition.
- Provides an example query
SELECT * FROM Movies WHERE studioName = 'Disney' AND year = 1990;to illustrate the inefficiency of a full table scan. - Contrasts the naive approach of testing all tuples with the more efficient methods of using a single-attribute index on
yearor a multiattribute index on(studioName, year). - Explains that indexes can also be useful in queries involving joins, as illustrated in Example 8.9.
#2
read: Section 8.3.1: Motivation for Indexes (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- Introduces the problem of scanning large relations to find a few tuples matching a condition.
- Provides an example query `SELECT * FROM Movies WHERE studioName = 'Disney' AND year = 1990;` to illustrate the inefficiency of a full table scan.
- Contrasts the naive approach of testing all tuples with the more efficient methods of using a single-attribute index on `year` or a multiattribute index on `(studioName, year)`.
- Explains that indexes can also be useful in queries involving joins, as illustrated in Example 8.9.
When relations are very large, it becomes expensive to scan all the tuples of a relation to find those (perhaps very few) tuples that match a given condition. An index provides a data structure to make such retrievals efficient.
Let \(R\) be a relation and let \(\sigma_{C}(R)\) be a selection query where \(C\) is a selection condition. The naive implementation of this query requires examining every tuple in \(R\). The cost of this operation is proportional to the number of tuples, \(|R|\). This procedure is inefficient if the number of tuples satisfying the condition is small relative to the total number of tuples, i.e., \(|\sigma_{C}(R)| \ll |R|\).
The motivation for indexes arises from two primary scenarios: accelerating selections and accelerating joins.
Accelerating Selections with
WHEREclausesConsider the query to find all movies made by Disney in 1990.
SELECT * FROM Movies WHERE studioName = 'Disney' AND year = 1990;A naive execution plan must test the
WHEREclause condition on every tuple in theMoviesrelation. IfMoviescontains 10,000 tuples, all 10,000 are examined. An index on theyearattribute would allow the system to first retrieve only the tuples from 1990 and then test the studio condition. A multiattribute index on(studioName, year)would be even more efficient, allowing direct access to the few tuples that satisfy both conditions.Accelerating Joins
Indexes are also useful in queries that involve a join. Consider the query to find the name of the producer of ‘Star Wars’.
SELECT name FROM Movies, MovieExec WHERE title = 'Star Wars' AND producerC# = cert#;Without indexes, the system might have to examine every pair of tuples from
MoviesandMovieExec. With an index onMovies(title)and another onMovieExec(cert#), the query can be answered by looking at only two tuples. First, the index ontitleis used to find the tuple for ‘Star Wars’ inMovies. Then, itsproducerC#value is used with the index oncert#to find the corresponding tuple inMovieExec.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-10-15 23:40:35
- Duration: 21.17075514793396
- Cummulative cost: $2.53e-02
Declaring Indexes
- Notes that index creation is not part of the SQL standard (up to SQL-99) but is common in commercial DBMS.
- Presents the typical syntax for creating a single-attribute index:
CREATE INDEX YearIndex ON Movies (year);. - Explains how a DBMS can build and use multiattribute indexes to efficiently find tuples based on values for several attributes.
- Discusses the importance of the order of attributes in a multiattribute index, as it can be used for queries on a prefix of the attributes.
#3
read: Section 8.3.2: Declaring Indexes (section)
model: pro
rules: bullet definition math presentation short
Present the following. Use definition formal definition when it applies.
- Notes that index creation is not part of the SQL standard (up to SQL-99) but is common in commercial DBMS.
- Presents the typical syntax for creating a single-attribute index: `CREATE INDEX YearIndex ON Movies (year);`.
- Explains how a DBMS can build and use multiattribute indexes to efficiently find tuples based on values for several attributes.
- Discusses the importance of the order of attributes in a multiattribute index, as it can be used for queries on a prefix of the attributes.
The creation of indexes is not a component of any SQL standard up to and including SQL-99. However, most commercial database management systems provide a means for a database designer to declare them.
The syntax for creating an index is generally consistent, although specific implementations may vary.
Definition: CREATE INDEX StatementA typical syntax for creating an index on one or more attributes of a relation is as follows:
CREATE INDEX <index_name> ON <relation_name> (<attribute_1>, <attribute_2>, ...);For a single attribute, the statement is simplified:
CREATE INDEX YearIndex ON Movies (year);A multi-attribute index can be built on a combination of attributes, which is efficient for queries that specify values for those attributes.
For instance, since
(title, year)forms a key for theMoviesrelation, an index on both attributes provides direct access to a unique tuple when both values are known.CREATE INDEX KeyIndex ON Movies (title, year);The order of attributes listed in a multi-attribute index is significant.
An index on
(A, B)can typically be used to efficiently find tuples for a given value ofA. It is generally not efficient for finding tuples based only on a value forB. Therefore, the designer must consider common query patterns when choosing the attribute order.
- Version: 1
- Model: gemini-2.5-pro
- Created: 2025-10-15 23:40:51
- Duration: 15.199302911758423
- Cummulative cost: $1.60e-02