Postgres and SQL Basics
The foundational knowledge stems from the NYU’s “CSCI-UA 479 Data Management and Analysis” course by Joe Versoza.
PostgreSQL
Installation package for MacOS: Follow this Tutorial
Postgres Object Hierarchy
(Photos from Brandur.org)
- Templates - base template(s) to copy database from
- Databases - multiple databases allowed inon postgres instance/”cluster”
- Schemas - Groups together a set of Relations (Any other type of named Postgres object like a table, view, index, or function - name spacing). Duplicate relation names are not allowed in the same schema, but are allowed in different schemas. (Brandur.org)
- Tables - define the data structure and store the actual data values within databases (Brandur.org)
- Views - Abstraction of table / multiple tables: merge tables and perform calculations and present data as if it were a table; typically read only
PSQL
psql is command line client of postgres databases.
Commands
\l
: list databases\dt
: list tables\dv
: list views\d
: list tables and views\dn
: list schemas\d table_name
: discribe table\du
: list users
Naming Conventions
Table names enclosed in double quotes are case-sensitive, while unquoted names are normalized to lowercase (which may not be suitable depending on your table names). To ensure consistency and best practices:
- Keep table and column names in lowercase.
- Avoid double quoting table names.
- Separate words with underscores.
- Choose descriptive names.
- Utilize underscore for foreign keys (e.g., foo_id).
- Maintain consistency in pluralization, either always using singular or plural forms.
Types
PostgreSQL offers a wide array of types, and you can even create your custom ones. These types fall into several high-level categories:
Numeric
serial
(auto incrementing, pk if no “natural pk” apparent, called artificial / surrogate)integer
- typical choice for integer (signed), 4 bytessmallint
- 2 bytes, signedbigint
- 8 bytes, signeddecimal
/numeric
- user specified precision numbersreal
/double precision
- variable-precision numbers
Strings
text
- unlimited lengthvarchar(n)
- wheren
is num of characters (character varying)
Date and Time
timestamptz
(timestamp with timezone. Stored as UTC)timestamp
(no timezone)date
time
Booleans
true
't'
- even
'yes'
,'y'
, quoted true, etc…
Other
- currency (
money
) - shapes (
circle
,polygon
) - documents (
xml
,json
/jsonb
) - networking (
inet
for ipv4 and ipv6,cidr
for ip ranges)
As we explore these types, the following techniques will be employed:
- Use
SELECT
statements to execute operations and functions without tables. - Utilize the
pg_typeof
function to identify the type of a value. - Employ the syntax
value::type
to coerce a value into a specific type.
SQL Syntax
- Whitespace, including newlines and tabs, is acceptable within a statement. Therefore, formatting your code with line breaks and indentation for readability is highly encouraged.
- Statements should be concluded with a semicolon
;
. - Comments commence with
--
. - SQL keywords can be written in either upper or lower case, although it is common practice to use uppercase for keywords.
In the context of PostgreSQL:
- When writing SQL, names of objects are lowercase.
- Therefore, in PostgreSQL, it is advisable to avoid uppercase letters in object names to eliminate the need for quoting.
- Delimit strings with single quotes
'
- use double $ as quotes
$$what's this$$
E'\t'
- prefix with E to use \ as escape character
- use double $ as quotes
Creating a Database
When a database is created, it clones the template database template1
.
template1
can be modified so you can have a customized template (for example, add objects like languages, functions,etc.).- (there’s also
template0
, which is meant to be kept as an unmodified copy oftemplate1
’s initially configuration)
To create a database based off of the template template1
1
2
CREATE DATABASE some_database_name;
--uses same encoding and collation as template1
1
2
3
4
5
-- use utf8 as encoding, "copy" template0
-- instead of template1
CREATE DATABASE some_database_name
ENCODING 'UTF8';
TEMPLATE template0;
Additional information can be found in the CREATE DATABASE documentation.
Creating a Table
CREATE TABLE <table_name> (
- comma seperated list of column names and their type seperated by space
- can specify some constraints after type, such as:
NOT NULL
UNIQUE
PRIMARY KEY
- default value specified with:
DEFAULT value_to_default_to
)
1
2
3
4
5
CREATE TABLE <table_name> (
<column1_name> <type1> <constraints_if_any>,
<column2_name> <type2> <constraints_if_any>,
...
);
Create a student table with 5 fields: netid first, last, midterm and registered
1
2
3
4
5
6
7
CREATE TABLE student(
netid varchar(20) PRIMARY KEY,
first varchar(255) NOT NULL,
last varchar(255) NOT NULL,
midterm numeric,
registered timestamptz DEFAULT NOW()
);
Additional information can be found in the CREATE TABLE documentation.
Create (INSERT)
To add a new row to a table, use an INSERT
statement
1
2
3
4
-- values in order of field names (registered
-- left out, as it has a default value)
INSERT INTO student
VALUES ('fb123', 'foo', 'bar', 90);
1
2
3
4
5
-- specify columns and matching values (does not have
-- to follow same order of columns in CREATE TABLE)
INSERT INTO student
(first, last, midterm, netid)
VALUES ('baz', 'qux', 70, 'bq789');
Additional information can be found in the CREATE INSERT documentation.
Read (SELECT)
Use a SELECT
statement to read data:
- Begin with
SELECT
, followed by a comma-separated list of columns or calculated values you wish to display. - Optionally, use the keyword
FROM tablename
to specify the table for the query. - Optionally, include
WHERE cond
to define conditions for filtering rows. - Optionally, employ
ORDER BY
to specify the sorting order. - Optionally, use
LIMIT num
to restrict the number of returned rows.
Additionally, you can combine query results using UNION
and INTERSECT
.
The select list may consist of:
- Column names
AS some_alias
for renaming column names or naming calculations*
means all columns- Operations, functions, or expressions
DISTINCT
for retrieving only unique rowsDISTINCT ON (expression)
for the first row matching the expression (ensure you order by the expression to avoid unpredictable results).”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- give me col1, col2, and new_col only
-- new_col is a calculated field
-- this is the 'SELECT list'
SELECT col1, col2, col3 * 2 as new_col
-- from table, some_Table
FROM some_table
-- the value in col1 must be > 1
-- for the row to be returned
WHERE col1 > 1
-- sort by col2 ascending
ORDER BY col2
-- only give back, at most, 5 rows
LIMIT 5;
SELECT
retrieves rows from zero or more tables.
The processing order of the parts of a SELECT
query is as follows:
FROM
: Determines the set of all possible rows to be returned.WHERE
: Filters out rows that do not match the specified criteria.SELECT list
: Determines the actual values of the resulting rows by evaluating expressions, resolving column names, etc.DISTINCT
: Eliminates duplicate rows in the output (orDISTINCT ON
).ORDER BY
: Sorts the output rows.LIMIT
: Restricts the output rows to a specific number.
Additional information can be found in the SELECT documentation.
Operators
- arithmetic:
+
,-
,*
,/
- string concatenation:
||
('HI' || 'THERE'
) - logical operators:
AND
,OR
,NOT
- check for NULL:
IS NULL
andIS NOT NULL
- pattern matching, case sensitive and insensitive:
LIKE
,ILIKE
Functions
NOW()
(current date / time),ROUND(val)
, etc.
Additional information can be found in the Functions and Operators documentation.
SELECT examples
1
2
-- get all students
SELECT * from student;
1
2
-- get all students, just netid, first, and alias first as fn
SELECT netid, first AS fn FROM student;
1
2
-- get all students, show netid and midterm grade divided by 100
SELECT netid, midterm / 100 FROM student;
SELECT + DISTINCT
This shows the distinct rows (remove duplicate rows)
1
2
-- show the distinct first names of students
SELECT DISTINCT first FROM student;
SELECT + WHERE
Optionally, add a WHERE
clause to specify conditions (think filtering)
- conditions can use operators like
=
,<>
(not equal),>
,<
- you can also use
LIKE
andILIKE
with%
representing wildcards to match on substrings (ILIKE
is case insensitive) - use
col_name IS NULL
to check for aNULL
value - multiple conditions can be put together with
AND
,OR
,NOT
- parentheses can be added to specify precedence
SELECT + WHERE examples
1
2
-- only students with midterm > 80
SELECT * FROM student WHERE midterm > 80;
1
2
3
4
5
-- only students with between 70 and 70
SELECT * FROM student
WHERE midterm > 70
AND midterm < 90;
-- can be done with BETWEEN 71 and 89
1
2
3
4
5
6
7
-- students that have no midterm score
SELECT * FROM student WHERE midterm IS NULL;
-- get the netid and first name of students with taht have a netid
-- that has jv in it or start with Jo, case insensitive
SELECT netid, first FROM student
WHERE netid LIKE '%jv%'
OR first ILIKE 'Jo%'
Ordering
Add an ORDER BY
clause at the end of SELECT
to specify ascending ordering
1
2
3
SELECT * FROM student
WHERE midterm < 60
ORDER BY registered;
Add DESC
to order tin descending order
1
2
SELECT netid FROM student
ORDER BY registered DESC;
Seperate multiple column names to order by multiple columns
1
SELECT * FROM student ORDER BY last, first;
LIMIT
LIMIT
clause can be added at the end of a SELECT
statement (after ORDER BY
) to contrain the number of results of the query
Only show 10 results
1
2
3
4
SELECT netid, first, last
FROM student
ORDER BY netid
LIMIT 10;
DISTINCT ON
Show the highest score per student first name
1
2
3
SELECT DISTINCT ON (first) netid, first, midterm
FROM student
ORDER BY first, midterm DESC;
Update (UPDATE)
Use an UPDATE
statement to set the value of a column for a row/rows
- Begin with the keyword
UPDATE
. - Specify the name of the table to be updated.
- Follow this with the keyword
SET
. - Finally,
column_name = some value
.
1
2
-- set all students' registered field to 1/1/2024
UPDATE student SET registered = '2024-01-01';
Add WHERE
clause to UPDATE
(after SET
) to specify which rows to chage
1
2
3
4
-- only set midterm score for rows that have netid fb123
UPDATE student
SET midterm = 80
WHERE netid = 'fb123';
Additional information can be found in the UPDATE documentation.
Delete / Remove Rows (DELETE)
Use DELETE
statement to remove rows from a table
1
DELETE FROM student WHERE midterm > 90;
Add / Remove Column (ALTER)
Use ULTER TABLE
to add / remove columns
1
2
3
-- add a new column
ALTER TABLE student
ADD COLUMN final_exam_score integer;
1
2
3
-- add a new column with a default value
ALTER TABLE student
ADD COLUMN final_exam_score numeric DEFAULT 80;
1
2
-- remove a column
ALTER TABLE student DROP COLUMN final_exam_score;
ALTER TABLE
can also be used to modify columns
1
2
3
4
-- change data type of column
ALTER TABLE student
ALTER COLUMN netid
SET DATA TYPE varchar(200);
1
2
3
-- rename column
ALTER TABLE student
RENAME COLUMN midterm TO midterm_score;
Additional information can be found in the DELETE documentation.
Casting
We can use either of the two expression to cast a value from one type to another
CAST (columnname AS newType)
val::newType
1
2
3
SELECT netid,
CAST (midterm AS smallint) AS smol_mid
FROM student;
1
2
3
-- assume that midterm is integer
-- cast to numeric
SELECT * FROM student ORDER BY ROUND(midterm::numeric, 2);
Formatting (ROUND)
ROUND
rounds a numeric value to a specified number of decimal places
1
2
3
4
-- assuming midterm is now a integer
-- cast to numeric
-- so that we can round to two places
SELECT netid, ROUND(CAST(midterm AS numeric),2) FROM student;
Removing tables / Databases (DROP)
Use the DROP
command to remove databases or table
DROP TABLE table_name;
DROP DATABASE database_name;
Usage notes:
- Ensure you connect to another database if you intend to drop the currently connected database.
- Use
IF EXISTS
to prevent errors if the table being dropped does not exist.DROP TABLE IF EXISTS table_name;