Notes on SQL

4 June 2021

Some notes on SQL: these are in terse/note form because good detailed material in greater depth is readily found elsewhere.

The general form for select syntax is:

SELECT *
FROM table_a AS a
JOIN table_b AS b
ON a.id = b.id
WHERE [condition]
GROUP BY [something]
HAVING [condition]
ORDER BY [something]
;

(The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.)

The general form for a SQL query is:

CREATE TABLE ...
ALTER TABLE ... ADD COLUMN ...
INSERT INTO ... VALUES ...
UPDATE ... SET ... WHERE ...
SELECT ... FROM ... WHERE ...
SELECT ... FROM ... JOIN ... ON ...
DELETE FROM ... WHERE ...

Subsets of SQL have their own TLAs (three letter acronyms)

  • DQL — Data Query Language; SELECT and related statements
  • DDL — Data Definition Language; CREATE, ALTER, DROP
  • DML — Data Manipulation Language; INSERT, UPDATE, DELETE
  • DCL — Data Control Language; GRANT, DENY, REVOKE

Join types

  • inner
  • left outer
  • right outer
  • full outer
  • cross join

Some MS SQL datatypes

  • datetime2
  • BIGINT, INT, SMALLINT, TINYINT, BIT
  • MONEY, SMALLMONEY
  • DECIMAL, NUMERIC
  • FLOAT, REAL
  • CHAR, VARCHAR
  • NCHAR, NVARCHAR

See https://www.postgresql.org/docs/11/datatype.html for PostgreSQL data types

Example data modelling diagram

Diagram

where the symbols denote cardinality

e.g. 11 customer places 0\ge 0 orders

Diagram

e.g. 11 order contains 1\ge 1 line-items

Diagram

e.g. 1\ge 1 customers use 1\ge 1 addresses

Diagram

Normal form

  1. First normal form — every table has a key, each attribute is atomic (i.e. standalone)
  2. Second normal form — every non-key attribute dependent on whole key
  3. Third normal form — every non-key attribute dependent only on key(s)

where the key is the minimal set of attributes that determines all other attributes in the table

We can use constraints to set up one-to-many relationships in a sensible way.

Consider 0 or more students enrolling in 1 or more courses:

Diagram

CREATE TABLE student (
    studentid INT NOT NULL,
    firstname TEXT NOT NULL,
    lastname TEXT NOT NULL,
    CONSTRAINT pk_student PRIMARY KEY (studentid)
);

CREATE TABLE course (
    coursecode TEXT NOT NULL,
    coursetitle TEXT NOT NULL,
    CONSTRAINT pk_course PRIMARY KEY (coursecode)
);

CREATE TABLE studentcourse (
    studentid INT NOT NULL,
    coursecode TEXT NOT NULL,
    -- combination of studentid and coursecode is the primary key for this table
    CONSTRAINT pk_studentcourse PRIMARY KEY (studentid, coursecode),
    -- the studentid must exist in the table student
    CONSTRAINT fk_studentcourse FOREIGN KEY (studentid) REFERENCES student (studentid),
    -- the coursecode must exist in the table course
    CONSTRAINT fk_coursecode FOREIGN KEY (coursecode) REFERENCES course (coursecode)
);

What are considered at different levels of data modelling are shown below

Data model Conceptual Logical Physical
Entities . .
Relationships . .
Attributes . .
Primary keys .
Foreign keys .
Alternate keys . ?
Null-ability . ?
Table names . .
Column names . .
Column types . .

Two fundamental data modelling types are:

  • Top-down: domain -> entity relationships -> attributes and rules
  • Bottom-up: attributes -> relationships and rules -> entities

Why use a relational database? Because they are said to provide:

  • Atomicity
  • Consistency
  • Isolation
  • Durability

WRT NoSQL (i.e. not relational databases), CAP theorem says you can have two of:

  • Consistency
  • Availability
  • Partition tolerance

The weakness of traditional relational databases is partition tolerance i.e. node faults; the weakness of NoSQL is consistency.