Some notes on SQL: these are in terse/note form because good detailed material in greater depth is readily found elsewhere.
The general form for select syntax is:
SELECT *
FROM table_a AS a
JOIN table_b AS b
ON a.id = b.id
WHERE [condition]
GROUP BY [something]
HAVING [condition]
ORDER BY [something]
;
(The HAVING
clause was added to SQL because the WHERE
keyword cannot be used with aggregate functions.)
The general form for a SQL query is:
CREATE TABLE ...
ALTER TABLE ... ADD COLUMN ...
INSERT INTO ... VALUES ...
UPDATE ... SET ... WHERE ...
SELECT ... FROM ... WHERE ...
SELECT ... FROM ... JOIN ... ON ...
DELETE FROM ... WHERE ...
Subsets of SQL have their own TLAs (three letter acronyms)
DQL
— Data Query Language;SELECT
and related statementsDDL
— Data Definition Language;CREATE
,ALTER
,DROP
DML
— Data Manipulation Language;INSERT
,UPDATE
,DELETE
DCL
— Data Control Language;GRANT
,DENY
,REVOKE
Join types
- inner
- left outer
- right outer
- full outer
- cross join
Some MS SQL datatypes
- datetime2
BIGINT
,INT
,SMALLINT
,TINYINT
,BIT
MONEY
,SMALLMONEY
DECIMAL
,NUMERIC
FLOAT
,REAL
CHAR
,VARCHAR
NCHAR
,NVARCHAR
See https://www.postgresql.org/docs/11/datatype.html for PostgreSQL data types
Example data modelling diagram
where the symbols denote cardinality
e.g. customer places orders
e.g. order contains line-items
e.g. customers use addresses
Normal form
- First normal form — every table has a key, each attribute is atomic (i.e. standalone)
- Second normal form — every non-key attribute dependent on whole key
- Third normal form — every non-key attribute dependent only on key(s)
where the key is the minimal set of attributes that determines all other attributes in the table
We can use constraints to set up one-to-many relationships in a sensible way.
Consider 0 or more students enrolling in 1 or more courses:
CREATE TABLE student (
studentid INT NOT NULL,
firstname TEXT NOT NULL,
lastname TEXT NOT NULL,
CONSTRAINT pk_student PRIMARY KEY (studentid)
);
CREATE TABLE course (
coursecode TEXT NOT NULL,
coursetitle TEXT NOT NULL,
CONSTRAINT pk_course PRIMARY KEY (coursecode)
);
CREATE TABLE studentcourse (
studentid INT NOT NULL,
coursecode TEXT NOT NULL,
-- combination of studentid and coursecode is the primary key for this table
CONSTRAINT pk_studentcourse PRIMARY KEY (studentid, coursecode),
-- the studentid must exist in the table student
CONSTRAINT fk_studentcourse FOREIGN KEY (studentid) REFERENCES student (studentid),
-- the coursecode must exist in the table course
CONSTRAINT fk_coursecode FOREIGN KEY (coursecode) REFERENCES course (coursecode)
);
What are considered at different levels of data modelling are shown below
Data model | Conceptual | Logical | Physical |
---|---|---|---|
Entities | ✓ | . | . |
Relationships | ✓ | . | . |
Attributes | . | ✓ | . |
Primary keys | . | ✓ | ✓ |
Foreign keys | . | ✓ | ✓ |
Alternate keys | . | ? | ✓ |
Null-ability | . | ? | ✓ |
Table names | . | . | ✓ |
Column names | . | . | ✓ |
Column types | . | . | ✓ |
Two fundamental data modelling types are:
- Top-down: domain -> entity relationships -> attributes and rules
- Bottom-up: attributes -> relationships and rules -> entities
Why use a relational database? Because they are said to provide:
- Atomicity
- Consistency
- Isolation
- Durability
WRT NoSQL (i.e. not relational databases), CAP theorem says you can have two of:
- Consistency
- Availability
- Partition tolerance
The weakness of traditional relational databases is partition tolerance i.e. node faults; the weakness of NoSQL is consistency.