8.3 Indexes in SQL

Indexes in SQL

#1

read: Section 8.3: Indexes in SQL (section)

model: pro
rules: bullet definition math presentation short


Write a short summary.

An index is an auxiliary data structure associated with a relation that serves to expedite the retrieval of tuples. By organizing pointers to tuples based on the values of one or more attributes, an index allows the database system to locate matching tuples without scanning the entire relation. This mechanism is particularly effective for queries that specify conditions on the indexed attributes and for performing joins.

  • Purpose of an Index

    An index on an attribute \(A\) of a relation \(R\) is a data structure designed to efficiently find tuples in \(R\) that have a specific value for attribute \(A\). It functions as a lookup table, mapping attribute values to the storage locations of the corresponding tuples. This avoids the need for a full scan of the relation, which is inefficient for large tables.

  • Application in Queries

    Indexes are beneficial in two primary scenarios. First, they accelerate the execution of queries with WHERE clauses that filter tuples based on the indexed attributes. For instance, an index on the year attribute of a Movies relation would allow the system to directly access only the tuples for a given year. Second, indexes can significantly improve the performance of join operations by enabling direct lookups of matching tuples between relations.

  • Types of Indexes

    An index can be created on a single attribute or on a combination of attributes, known as a multi-attribute or composite index. For a multi-attribute index on \((A, B)\), the order of attributes is significant. Such an index is most effective for queries that provide a value for attribute \(A\), or for both \(A\) and \(B\).

  • Declaration and Management

    Although not part of the formal SQL standard, most commercial database systems support commands for managing indexes. A typical syntax is CREATE INDEX index_name ON table_name (attribute_list). An index can be removed using a command such as DROP INDEX index_name.

  • Performance Considerations

    The decision to create an index involves a trade-off. While indexes improve the performance of data retrieval (SELECT statements), they introduce overhead for data modification operations (INSERT, DELETE, UPDATE), as the index structure must be updated along with the data in the relation.

Definition: Index

Let \(R\) be a relation over a set of attributes \(S\). Let \(K \subseteq S\) be a set of attributes designated as the index key. Let \(T\) be the set of tuples in relation \(R\), and let \(loc(t)\) denote the physical storage address of a tuple \(t \in T\).

An index \(I\) on the index key \(K\) is a mapping from the domain of \(K\) to the powerset of tuple locations, defined as:

\[ I(v) = \{ loc(t) \mid t \in T \land t[K] = v \} \]

where \(v\) is a value for the attributes in \(K\). The index provides a data structure that enables the efficient computation of \(I(v)\) for any given \(v\).

  • Version: 1
  • Model: gemini-2.5-pro
  • Created: 2025-10-15 23:40:14
  • Duration: 19.4617440700531
  • Cummulative cost: $2.23e-02

Motivation for Indexes

  • Introduces the problem of scanning large relations to find a few tuples matching a condition.
  • Provides an example query SELECT * FROM Movies WHERE studioName = 'Disney' AND year = 1990; to illustrate the inefficiency of a full table scan.
  • Contrasts the naive approach of testing all tuples with the more efficient methods of using a single-attribute index on year or a multiattribute index on (studioName, year).
  • Explains that indexes can also be useful in queries involving joins, as illustrated in Example 8.9.
#2

read: Section 8.3.1: Motivation for Indexes (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Introduces the problem of scanning large relations to find a few tuples matching a condition.
- Provides an example query `SELECT * FROM Movies WHERE studioName = 'Disney' AND year = 1990;` to illustrate the inefficiency of a full table scan.
- Contrasts the naive approach of testing all tuples with the more efficient methods of using a single-attribute index on `year` or a multiattribute index on `(studioName, year)`.
- Explains that indexes can also be useful in queries involving joins, as illustrated in Example 8.9.

When relations are very large, it becomes expensive to scan all the tuples of a relation to find those (perhaps very few) tuples that match a given condition. An index provides a data structure to make such retrievals efficient.

The Problem: Inefficient Full Relation Scan

Let \(R\) be a relation and let \(\sigma_{C}(R)\) be a selection query where \(C\) is a selection condition. The naive implementation of this query requires examining every tuple in \(R\). The cost of this operation is proportional to the number of tuples, \(|R|\). This procedure is inefficient if the number of tuples satisfying the condition is small relative to the total number of tuples, i.e., \(|\sigma_{C}(R)| \ll |R|\).

The motivation for indexes arises from two primary scenarios: accelerating selections and accelerating joins.

  • Accelerating Selections with WHERE clauses

    Consider the query to find all movies made by Disney in 1990.

    SELECT *
    FROM Movies
    WHERE studioName = 'Disney' AND year = 1990;

    A naive execution plan must test the WHERE clause condition on every tuple in the Movies relation. If Movies contains 10,000 tuples, all 10,000 are examined. An index on the year attribute would allow the system to first retrieve only the tuples from 1990 and then test the studio condition. A multiattribute index on (studioName, year) would be even more efficient, allowing direct access to the few tuples that satisfy both conditions.

  • Accelerating Joins

    Indexes are also useful in queries that involve a join. Consider the query to find the name of the producer of ‘Star Wars’.

    SELECT name
    FROM Movies, MovieExec
    WHERE title = 'Star Wars' AND producerC# = cert#;

    Without indexes, the system might have to examine every pair of tuples from Movies and MovieExec. With an index on Movies(title) and another on MovieExec(cert#), the query can be answered by looking at only two tuples. First, the index on title is used to find the tuple for ‘Star Wars’ in Movies. Then, its producerC# value is used with the index on cert# to find the corresponding tuple in MovieExec.

  • Version: 1
  • Model: gemini-2.5-pro
  • Created: 2025-10-15 23:40:35
  • Duration: 21.17075514793396
  • Cummulative cost: $2.53e-02

Declaring Indexes

  • Notes that index creation is not part of the SQL standard (up to SQL-99) but is common in commercial DBMS.
  • Presents the typical syntax for creating a single-attribute index: CREATE INDEX YearIndex ON Movies (year);.
  • Explains how a DBMS can build and use multiattribute indexes to efficiently find tuples based on values for several attributes.
  • Discusses the importance of the order of attributes in a multiattribute index, as it can be used for queries on a prefix of the attributes.
#3

read: Section 8.3.2: Declaring Indexes (section)

model: pro
rules: bullet definition math presentation short


Present the following.  Use definition formal definition when it applies.

- Notes that index creation is not part of the SQL standard (up to SQL-99) but is common in commercial DBMS.
- Presents the typical syntax for creating a single-attribute index: `CREATE INDEX YearIndex ON Movies (year);`.
- Explains how a DBMS can build and use multiattribute indexes to efficiently find tuples based on values for several attributes.
- Discusses the importance of the order of attributes in a multiattribute index, as it can be used for queries on a prefix of the attributes.
  • The creation of indexes is not a component of any SQL standard up to and including SQL-99. However, most commercial database management systems provide a means for a database designer to declare them.

  • The syntax for creating an index is generally consistent, although specific implementations may vary.

    Definition: CREATE INDEX Statement

    A typical syntax for creating an index on one or more attributes of a relation is as follows:

    CREATE INDEX <index_name> ON <relation_name> (<attribute_1>, <attribute_2>, ...);

    For a single attribute, the statement is simplified:

    CREATE INDEX YearIndex ON Movies (year);
  • A multi-attribute index can be built on a combination of attributes, which is efficient for queries that specify values for those attributes.

    For instance, since (title, year) forms a key for the Movies relation, an index on both attributes provides direct access to a unique tuple when both values are known.

    CREATE INDEX KeyIndex ON Movies (title, year);
  • The order of attributes listed in a multi-attribute index is significant.

    An index on (A, B) can typically be used to efficiently find tuples for a given value of A. It is generally not efficient for finding tuples based only on a value for B. Therefore, the designer must consider common query patterns when choosing the attribute order.

  • Version: 1
  • Model: gemini-2.5-pro
  • Created: 2025-10-15 23:40:51
  • Duration: 15.199302911758423
  • Cummulative cost: $1.60e-02