Relational Database Systems |

Relational Database Systems (RDBS)

A typical and most common example of ACID database systems.
Data is modeled using relations (look like tables).
Quite old (around 1970).
Forms a basis for the SQL language.
Relational database system store entities / records / rows.

Relation – Definition

Relation – Set of tuples.
Tuple is a list of attributes (a₁, a₂, a₃, …, a_k).
An attribute has a name, value and domain.
a_n – value of the n-th attribute in the tuple, a_i in D_i.
D_n – domain of the n-th attribute (a set of values allowed for the attribute).
A_n – name of the n-th attribute.
PERSON(ID: INTEGER, AGE: AGE_INTEGER, NAME: STRING)
Do not confuse Relations with relationships!

Relation

Relations are usually written down using tables.
Relational schema – the names and headers of tables;
- definition of the table form (not data).
Attribute – a table column.
Relation element (tuple) – a table row.
Attribute name – the name of a table column.
Attribute domain – a data type of the table column.
In practice, the terms: relation, schema, tables are used interchangeably.
- They do not mean the same however!

Relation -- Table

Relational algebra

description of a data structure using algebra and logic
a simple and proven approach
allows working with the data model before anything is implemented
- saves a lot of time
built upon set operations:
- product, union, intersection, difference
basis for the SQL language

RA – Operations

Standard set operations:
- compatible attributes (columns):
  - union,
  - difference,
  - intersection,
- cartesian product.
Special operations:
- projection,
- selection / restriction,
- Θ-join (theta-join),
- natural join.

RA – Set Operations

Color	Style
white	bold
yellow	italic
cyan	underline

Color	Style
pink	capitalized
yellow	italic

R1 ∪ R2

Color	Style
white	bold
yellow	italic
cyan	underline
pink	capitalized

R1 − R2

Color	Style
white	bold
cyan	underline

RA – Set Operations

Color	Style
white	bold
yellow	italic
cyan	underline

Color	Style
pink	capitalized
yellow	italic

R1 ∩ R2

Color	Style
yellow	italic

RA – Projection

Color	Style
white	bold
yellow	italic
cyan	underline

Color	Style
pink	capitalized
yellow	italic

R1[Color]

R1[Color, Style]

Color
white
yellow
cyan

Color	Style
white	bold
yellow	italic
cyan	underline

RA – Restriction / Selection

Color	Style
white	bold
yellow	italic
cyan	underline

Color	Style
pink	capitalized
yellow	italic

R1[(Color = white) ∨ (Color = cyan)]

(R1[Color = yellow])[Style]

Color	Style
white	bold
cyan	underline

Style
italic

Color1	Style
yellow	bold
yellow	italic
cyan	underline

Color2	Size	Font
pink	17	Verdana
cyan	24	Palatino
cyan	10	Verdana
yellow	19	Monaco

ϴ Join – R1[Color1 = Color2]R2

Color1	Style	Color2	Size	Font
yellow	bold	yellow	19	Monaco
yellow	italic	yellow	19	Monaco
cyan	underline	cyan	24	Palatino
cyan	underline	cyan	10	Verdana

Key

Relation is a set of tuples – tuples must be unique.
Key is a minimal set of attributes which uniquely identify every entity (tuple) - e.g. person:
- first name + last name + date of birth (compound key);
- SSN (social security number) (simple key);
- person number (simple key).
- How good are they?
In applications it is safest to use artificial keys:
- also called dumb keys – have no meaning.
Key is the core integrity constraint.
Weak entity – has only a foreign key (e.g. person-meeting).

Key cont.

There may be multiple keys in a table:
- PERSON(ID, SSN, FIRST_NAME, LAST_NAME, PASSPORT_NO, DATE_OF_BIRTH),
- ID,
- SSN,
- PASSPORT_NO,
- FIRST_NAME + LAST_NAME + DATE_OF_BIRTH,
- but not SSN + PASSPORT_NO!
One of the keys should be marked as a primary key.
The selection of the primary key is an implementation detail.
- It should be the smallest and quickest for machines.

Foreign Key

Represents relationships between relations (entities).
The foreign key is referential integrity constraint.
An attribute of one relation (R2) on which a Foreign Key is defined must have either:
- A value of an attribute of another relation (R1) (preferably key)
- An empty value (NULL)
R1 – master / parent relation
R2 – detail / dependent relation

Foreign Key – Example

CONTACTS(ID_CONTACT, ID_PERSON, ID_CONTACT_TYPE, CONTACT)
There are foreign keys on ID_PERSON and ID_CONTACT_TYPE columns:
- FOREIGN KEY (id_contact_type) REFERENCES contact_type(id_contact_type)
- FOREIGN KEY (id_person) REFERENCES person(id_person)
A master table is person and contact_type.

SQL Language

SQL is a programming language which can be used to communicate with a (relational) database system.
SQL is based on relational algebra, but has many extensions.
SQL is most often used to:
- Query the state of the database (aka retrieve data);
- Send requests for database state change (aka modify data);
- Define the database schema (aka create tables).

SQL Requirements / Properties

Data is stored in form of tables (which should be relations).
The sender (= application) does not care about the physical data storage.
The order of anything is not guaranteed or assumed:
- columns are identified by their names,
- rows are identified by the key values.
Declarative language:
- Define What should be done, not how.
- No assignment, conditions, loops.
- The SQL interpreter generates and executes the procedure.

SQL Language

First prototype: Sequel (1974)
Old, but an actively developed language:
- ISO & ANSI standards: 1986–2011,
- 8 versions so far.
The standard deals with the interpreter, it is not really good place to learn SQL.
Real-world implementations are behind:
- SQL-92 is available almost everywhere,
- SQL-1999 is available with top vendors.
A lot of dialects and derivatives.

SQL Language – naming conventions

Database objects – tables, columns, keys.
Anything is allowed in name, but must be quoted (").
To avoid quotes (across different vendors):
- use no special characters (allowed: a-z, 0-9, _),
- use underscores for delimiters (id_person),
- use either all lower-case or all upper-case,
- keep it reasonably short (less than approx 30 characters).
Try to avoid language keywords (SQL has many of them).

General naming conventions

Think twice about each name.
The name should be as specific as possible.
- What does an item represent?
- Would a product be better?
Use no abbreviations (except for id).
- If you must, use known abbreviations.
- What does prsn_fn mean?
Avoid repetition:
- e.g. the column person_name in the table persons;
- except for columns with keys (id_person).

SQL – Introduction

Forget procedural programming, SQL is something completely different.
Using SQL, you define what should be done.
- This is surprisingly more difficult than defining how something should be done.
Nested database objects are accessed with . convention:
- schema.table.column,
- e.g. my_project.persons.id_person
- The schema rarely changes during application run, so it is omitted.
- Omitting a table name is discouraged except for very simple queries.

SQL – Data types

Each database system has different data types, but there are some common:
- character / character varying / varchar – a string limited by some length,
- text / longtext / (whatever)text – a string virtually unlimited (cell size over 1GB),
- number / numeric / decimal – a decimal number with some range (precision, scale),
- int / integer – a whole number,
- datetime / timestamp – a time value.

SQL – Syntax Conventions

Functions and keywords are written in UPPERCASE.
Italics marks placeholder:
- DELETE FROM table
[ ] – an optional part.
{ } – a set of elements.
| – an exclusive selection.

SQL – Syntax Conventions Examples

[ LEFT | RIGHT ] JOIN represents:
- LEFT JOIN
- RIGHT JOIN
- JOIN
a { = | < | > } b represents:
- a = b
- a < b
- a > b

SQL Commands

Define the structure of data (database schema) – DDL (data definition language):
- CREATE {SCHEMA | TABLE | COLUMN | INDEX }
- ALTER {SCHEMA | TABLE | COLUMN | INDEX }
- DROP {SCHEMA | TABLE | COLUMN | INDEX }
Manipulate with data – DML (data manipulation language):
- INSERT, SELECT, UPDATE, DELETE
In certain areas, there can be considerable differences between different database servers!

SQL Commands and Tables

SQL command results are:
- SELECT command returns results in a table,
- other commands return only true/false.
Tables can be:
- physical – defined in the database schema
- virtual:
  - persistent – views (external schema)
  - volatile – the result of a SELECT query

Database Views

A view is a database object which looks like a table.
A view is defined by a SELECT query.
A view is usually only for reading and does not contain the actual copy of the data.
- I.e. it is updated as the underlying tables (used in the defining query) are updated.
A view (or any SELECT query) does not have to be a relation!
Views are used:
- for same reasons as functions in procedural programming,
- to define user sections of the database schema.

Checkpoint

Why shouldn’t you display the value of a dumb (artificial) key to the end-user?
If one relation has X rows and another relation has Y rows, how many rows can a theta-join of those relations have?
And how about an intersection?
Does every relation need to have a key?
How about a table?
What is the difference between a relation and a (database) table?
What is a dot . used for in SQL?
Must every relation have a foreign key?
Is is possible to write union in the SQL language?