Defining a Relation Schema in SQL

Defining a Relation Schema in SQL

# 1
read: Section 2.3: Defining a Relation Schema in SQL (section)
model: flash
rules: bullet presentation short

Give an outline of this section.  Be very brief on what's to come.

This section provides an initial examination of SQL, the primary language for relational databases. We shall cover the fundamental aspects necessary for defining relation schemas.

  • SQL Overview

    SQL comprises two principal components: a Data-Definition sublanguage for schema declarations and a Data-Manipulation sublanguage for querying and modifying data.

  • Relations in SQL

    We identify three classes of relations: stored tables, views (computed relations), and temporary tables. Our focus here is on stored tables.

  • Defining Tables: CREATE TABLE

    The CREATE TABLE statement is used to declare a stored relation’s schema, specifying its name, attributes, and data types. Key declarations are also part of this statement.

  • Data Types

    SQL supports various primitive data types, including fixed and varying-length character strings (CHAR, VARCHAR), bit strings, boolean values, integers (INT, SHORTINT), floating-point numbers (FLOAT, REAL, DOUBLE PRECISION), and fixed-decimal numbers (DECIMAL, NUMERIC). Special types exist for dates and times.

  • Modifying Schemas: ALTER TABLE

    Existing table schemas can be altered using the ALTER TABLE statement, allowing for the addition or deletion of attributes. The DROP TABLE statement removes an entire relation.

  • Default Values

    Attributes can be assigned a DEFAULT value, which is used when an explicit value is not provided during tuple creation or when new attributes are added to existing tables.

  • Declaring Keys

    Keys, essential for maintaining data integrity, are declared using PRIMARY KEY or UNIQUE. These declarations enforce uniqueness constraints and, for PRIMARY KEY, disallow NULL values.

  • Version: 1
  • Model: gemini-2.5-flash
  • Created: 2025-08-31 02:57:15
  • Duration: 4.108094930648804
  • Cummulative cost: $1.84e-03

Relations in SQL

  • Introduction to the three kinds of relations distinguished by SQL:
    1. Stored relations (tables): Relations that exist in the database and can be modified and queried.
    1. Views: Relations defined by a computation, not stored but constructed when needed.
    1. Temporary tables: Relations constructed by the SQL processor during query execution and then discarded.
# 2
read: Section 2.3.1: Relations in SQL (section)
model: flash

Write a SQL query specific to this section.

Example the syntax constructions relevent to this section only.  Ignore the rest for later.

In SQL, the fundamental mechanism for defining a stored relation, often termed a table, involves the CREATE TABLE statement. This statement allows one to specify the name of the relation, its constituent attributes, and the data type for each attribute.

Consider, for example, the declaration of a simple table named Students:

CREATE TABLE Students (
    studentID   INT,
    firstName   CHAR(50),
    lastName    CHAR(50),
    birthDate   DATE
);

This declaration illustrates the basic syntax for establishing a table schema:

  • The keywords CREATE TABLE initiate the statement, indicating that a new table is to be defined.
  • Students is the chosen name for this particular table.
  • The parenthesized list contains the definitions for each attribute, separated by commas. Each attribute definition consists of:
    • An attribute name (e.g., studentID, firstName, lastName, birthDate).
    • A data type appropriate for the values that attribute will hold (e.g., INT for integers, CHAR(50) for a fixed-length character string of 50 characters, DATE for date values).

Upon successful execution of this statement, a new table named Students is established within the database, ready to store tuples conforming to this defined schema. Subsequent sections will elaborate on more sophisticated aspects of table declaration, such as the specification of keys and default values for attributes.

  • Version: 1
  • Model: gemini-2.5-flash
  • Created: 2025-08-31 02:55:23
  • Duration: 6.660722970962524
  • Cummulative cost: $3.09e-03

Data Types

  • Overview of primitive data types supported by SQL systems for attributes.
  • Character strings: CHAR(n) for fixed-length strings and VARCHAR(n) for varying-length strings.
  • Bit strings: BIT(n) for fixed-length and BIT VARYING(n) for varying-length bit strings.
  • Boolean: BOOLEAN type with possible values TRUE, FALSE, and UNKNOWN.
  • Integers: INT or INTEGER for typical integers, and SHORTINT for potentially smaller integers.
  • Floating-point numbers: FLOAT or REAL for typical floating-point numbers, DOUBLE PRECISION for higher precision, and DECIMAL(n, d) or NUMERIC for fixed-point real numbers.
  • Dates and times: DATE and TIME data types for representing calendar dates and times of day.
# 3
read: Section 2.3.2: Data Types (section)
model: pro

Write a SQL query specific to this section.

Example the syntax constructions relevent to this section only.  Ignore the rest for later.

In SQL, every attribute in a relation schema must be assigned a data type. The data type specifies the kind of values that may appear in the column for that attribute. SQL supports a variety of primitive data types.

The primary categories of data types are:

  • Character Strings: These are for storing text. CHAR(n) defines a fixed-length string of \(n\) characters. If a stored string is shorter than \(n\), it is typically padded with trailing blanks. VARCHAR(n) defines a variable-length string with a maximum length of \(n\) characters.

  • Bit Strings: These are analogous to character strings but store strings of bits. BIT(n) is for fixed-length bit strings of length \(n\), and BIT VARYING(n) is for variable-length bit strings up to length \(n\).

  • Boolean: The BOOLEAN type holds logical values. The possible values are TRUE, FALSE, and UNKNOWN.

  • Numeric Types: SQL provides several types for numbers.

    • INT or INTEGER is used for typical integer values.
    • FLOAT or REAL represents floating-point numbers. DOUBLE PRECISION offers higher precision.
    • DECIMAL(n, d) represents numbers with a fixed decimal point. The precision is \(n\) total digits, with \(d\) digits to the right of the decimal.
  • Date and Time Types: SQL includes types to store temporal values.

    • DATE stores a date, such as DATE '1999-12-31'.
    • TIME stores a time of day, such as TIME '15:30:00.0'.

The declaration of a table schema involves listing each attribute and its corresponding data type.

Example Declaration

The following SQL statement declares the schema for a table named Product. This statement illustrates the use of several data types.

CREATE TABLE Product (
    model       INT,
    maker       VARCHAR(50),
    prodType    CHAR(10),
    price       DECIMAL(7, 2),
    releaseDate DATE
);

This statement defines a relation Product with five attributes. The model attribute is an integer. The maker is a variable-length string of up to 50 characters. The prodType is a fixed-length string of 10 characters. The price is a decimal number with up to 7 total digits, 2 of which are after the decimal point. Finally, releaseDate is a date.

  • Version: 1
  • Model: gemini-2.5-pro
  • Created: 2025-08-31 02:59:50
  • Duration: 18.68170166015625
  • Cummulative cost: $2.27e-02

Modifying Relation Schemas

  • Discussion on how to change a table schema after its creation.
  • Deleting a relation: Using the DROP TABLE R; statement.
  • Modifying a relation: Using the ALTER TABLE statement with options like ADD to add a new attribute and DROP to delete an existing attribute.
# 5
read: Section 2.3.4: Modifying Relation Schemas (section)
model: pro
rules: short presentation

first show how SQL is used to drop tables.

then show how new attributes can be added in SQL.

then show how existing attributes can be removed from a table using SQL.

then show how one can change an existing attribute's datatype in SQL.  Discuss issues with updating a datatype when the table already has tuples in it.

After a relation is created, its schema may need to be modified. SQL provides statements for altering and deleting relation schemas. A relation and all its tuples can be completely removed from the database using the DROP TABLE statement.

DROP TABLE R;

More frequently, the schema of an existing relation is modified. The ALTER TABLE statement is used for this purpose. This statement allows for several types of modifications to the schema.

  • Adding an Attribute: A new attribute can be added to a relation. Existing tuples will have a NULL value for this new attribute, unless a different DEFAULT value is specified.

    ALTER TABLE MovieStar ADD phone CHAR(16);
  • Removing an Attribute: An existing attribute can be removed from a relation. The component corresponding to this attribute is deleted from all existing tuples.

    ALTER TABLE MovieStar DROP birthdate;
  • Modifying an Attribute’s Data Type: The data type of an existing attribute can be changed. This operation can be problematic if the table contains data. For instance, changing an attribute’s type from CHAR(20) to CHAR(10) may result in data truncation. An attempt to change a VARCHAR to an INT will fail if any existing values cannot be converted to an integer.

    -- Syntax varies across SQL implementations
    ALTER TABLE MovieStar MODIFY name VARCHAR(50);
  • Version: 1
  • Model: gemini-2.5-pro
  • Created: 2025-08-31 10:25:31
  • Duration: 18.061478853225708
  • Cummulative cost: $1.49e-02
# issues-with-modify
model: pro

Write about the potential errors the database will raise when data types are not compatible.
Keep the ::: header and footer.

The modification of a relation’s schema, as through the ALTER TABLE statement, is a nontrivial operation. While adding or removing an attribute appears straightforward, the database management system must ensure that any such change does not violate the fundamental integrity of the data stored within the relation. A principal source of conflict arises when existing data is not compatible with a proposed change in the schema, particularly concerning data types.

Consider an attempt to change the data type of an existing attribute. For instance, a relation may have an attribute year defined as type INT, containing four-digit numbers. If one were to execute a command to alter this attribute’s type to CHAR(2), the database system would face an immediate problem. The value 1999, an integer, cannot be represented as a two-character string without truncation and loss of data. A properly designed system will not permit such a destructive conversion and will reject the ALTER TABLE operation, raising an error. The operation is only permissible if all existing values in the column can be safely and unambiguously coerced into the new data type.

A similar issue occurs when adding a new attribute to a relation that already contains tuples. As noted, when an attribute such as phone of type CHAR(16) is added, existing tuples are assigned a NULL value for this new component. However, if the new attribute were declared with a NOT NULL constraint, the operation would fail on any non-empty table. The system cannot invent a valid, non-null value for the existing tuples. An error is therefore the only logical outcome, unless a DEFAULT value is explicitly provided as part of the declaration.

In summary, the database system acts as a strict guardian of the schema. Any modification that would result in a discrepancy between the data and its declared type will be disallowed. The user or programmer must ensure that schema modifications are compatible with the data already present, or the system will refuse to proceed.

  • Version: 2
  • Model: gemini-2.5-pro
  • Created: 2025-08-31 10:31:15
  • Duration: 23.061673879623413
  • Cummulative cost: $3.06e-02

Default Values

  • Explains how to specify a default value for an attribute using the DEFAULT keyword in a table declaration. This value is used when a new tuple is inserted without a value for that attribute. The default can be NULL or a constant.
# 6
read: Section 2.3.5: Default Values (section)
model: pro
rules: short presentation

- Explains how to specify a default value for an attribute using the `DEFAULT` keyword in a table declaration. This value is used when a new tuple is inserted without a value for that attribute. The default can be `NULL` or a constant.

Demonstrate using simple SQL.

When we create or modify tuples, values for all components may not be available. Instead of using the special value NULL, it is often preferable to specify a different default value. The DEFAULT keyword allows us to assign a constant value to an attribute that will be used if no other value is known or provided.

This declaration is added to the attribute definition within a CREATE TABLE or ALTER TABLE statement. For instance, we can assign a default character for an unknown gender or a default value for a newly added attribute.

-- Using DEFAULT in a table creation
CREATE TABLE MovieStar (
    name      CHAR(30),
    address   VARCHAR(255),
    gender    CHAR(1) DEFAULT '?',
    birthdate DATE DEFAULT '0000-00-00'
);

-- Using DEFAULT when adding a new column
ALTER TABLE MovieStar
ADD phone CHAR(16) DEFAULT 'unlisted';
  • Version: 1
  • Model: gemini-2.5-pro
  • Created: 2025-08-31 10:34:23
  • Duration: 12.189771890640259
  • Cummulative cost: $1.30e-02

Declaring Keys

  • Describes the two ways to declare a key in a CREATE TABLE statement: inline with the attribute or as a separate declaration.
  • Introduces the PRIMARY KEY and UNIQUE constraints to enforce keyness.
  • Explains the difference between PRIMARY KEY (which does not allow NULL values) and UNIQUE (which permits NULL values).
# 7
read: Section 2.3.6: Declaring Keys (section)
model: pro
rules: short presentation

revise: add an SQL example illustrating "UNIQUE" together with "PRIMARY KEY".

A key can be declared for a stored relation using the CREATE TABLE statement. If the key is a single attribute, the declaration can be included with the attribute’s definition. For instance, to declare name as the key for MovieStar:

CREATE TABLE MovieStar (
    name CHAR(30) PRIMARY KEY,
    address VARCHAR(255),
    gender CHAR(1),
    birthdate DATE
);

If the key consists of more than one attribute, a separate declaration is used. A relation can have at most one primary key but may have several unique keys. For example, (title, year) can be the primary key for Movies, while a catalogID is also declared as a unique key.

CREATE TABLE Movies (
    title         CHAR(100),
    year          INT,
    length        INT,
    genre         CHAR(10),
    studioName    CHAR(30),
    producerC#    INT,
    catalogID     INT UNIQUE,
    PRIMARY KEY (title, year)
);

Two keywords are available to specify that a set of attributes forms a key: PRIMARY KEY and UNIQUE. The database system enforces these constraints by rejecting any INSERT or UPDATE operation that would result in a duplicate key value.

  • PRIMARY KEY: Attributes in a primary key are not allowed to have NULL values. The combination of values for these attributes must be unique across all tuples in the relation.

  • UNIQUE: The combination of values for attributes declared UNIQUE must also be unique. However, NULL values are permitted.

  • Version: 3
  • Model: gemini-2.5-pro
  • Created: 2025-08-31 10:40:58
  • Duration: 41.77283573150635
  • Cummulative cost: $8.36e-02