Joins
The foundational knowledge stems from the NYU’s “CSCI-UA 479 Data Management and Analysis” course by Joe Versoza.
Joins
Joins are a relational algebra operation executed in SQL, involving the merging of columns from one or more tables.
Generate Table
1
2
3
4
5
6
CREATE TABLE genre (
genre_id integer,
name varchar(20),
description text,
PRIMARY KEY (genre_id)
);
Genre Data
1
2
3
4
5
6
7
8
9
genre_id | name | description
----------+-----------------+-------------------------------------------
1 | Comedy | LOL
2 | Drama | Such feels.
3 | Fantasy | Swords. And maybe fairies too.
4 | Horror | Ghosts, goblins, and other spooky things.
5 | Science Fiction | Robots and stuff.
6 | Super Hero | Mostly capes.
7 | Thriller | Close. Your. Eyes.
Movie Table
1
2
3
4
5
6
7
8
CREATE TABLE movie (
movie_id serial,
title varchar(50) NOT NULL,
year smallint NOT NULL,
runtime smallint NOT NULL,
genre_id integer REFERENCES genre (genre_id),
PRIMARY KEY (movie_id)
);
Movie Data
1
2
3
4
5
6
7
8
9
10
11
12
13
movie_id | title | year | runtime | genre_id
----------+----------------------------------------+------+---------+----------
1 | Alphaville | 1965 | 99 |
2 | La Montaña Sagrada (The Holy Mountain) | 1973 | 99 |
3 | Dune | 1984 | 136 | 5
4 | Point Break | 1991 | 122 | 7
5 | Strange Days | 1995 | 145 | 5
6 | 2046 | 2004 | 122 | 2
7 | Hellboy | 2004 | 122 | 6
8 | Los Abrazos Rotos (Broken Embraces) | 2009 | 128 | 7
9 | Blade Runner 2049 | 2017 | 163 | 5
10 | Wonder Woman | 2017 | 141 | 6
11 | Shape of Water | 2018 | 123 | 3
CROSS JOIN
Cartesian product
- all rows from one table combined with each row from another table
- results in n * m rows
1
SELECT * FROM movie CROSS JOIN genre;
Result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
movie_id | title | year | runtime | genre_id | genre_id | name | description
----------+----------------------------------------+------+---------+----------+----------+-----------------+-------------------------------------------
1 | Alphaville | 1965 | 99 | | 1 | Comedy | LOL
2 | La Montaña Sagrada (The Holy Mountain) | 1973 | 99 | | 1 | Comedy | LOL
3 | Dune | 1984 | 136 | 5 | 1 | Comedy | LOL
4 | Point Break | 1991 | 122 | 7 | 1 | Comedy | LOL
5 | Strange Days | 1995 | 145 | 5 | 1 | Comedy | LOL
6 | 2046 | 2004 | 122 | 2 | 1 | Comedy | LOL
7 | Hellboy | 2004 | 122 | 6 | 1 | Comedy | LOL
8 | Los Abrazos Rotos (Broken Embraces) | 2009 | 128 | 7 | 1 | Comedy | LOL
9 | Blade Runner 2049 | 2017 | 163 | 5 | 1 | Comedy | LOL
10 | Wonder Woman | 2017 | 141 | 6 | 1 | Comedy | LOL
11 | Shape of Water | 2018 | 123 | 3 | 1 | Comedy | LOL
1 | Alphaville | 1965 | 99 | | 2 | Drama | Such feels.
2 | La Montaña Sagrada (The Holy Mountain) | 1973 | 99 | | 2 | Drama | Such feels.
3 | Dune | 1984 | 136 | 5 | 2 | Drama | Such feels.
4 | Point Break | 1991 | 122 | 7 | 2 | Drama | Such feels.
5 | Strange Days | 1995 | 145 | 5 | 2 | Drama | Such feels.
6 | 2046 | 2004 | 122 | 2 | 2 | Drama | Such feels.
7 | Hellboy | 2004 | 122 | 6 | 2 | Drama | Such feels.
8 | Los Abrazos Rotos (Broken Embraces) | 2009 | 128 | 7 | 2 | Drama | Such feels.
9 | Blade Runner 2049 | 2017 | 163 | 5 | 2 | Drama | Such feels.
10 | Wonder Woman | 2017 | 141 | 6 | 2 | Drama | Such feels.
11 | Shape of Water | 2018 | 123 | 3 | 2 | Drama | Such feels.
1 | Alphaville | 1965 | 99 | | 3 | Fantasy | Swords. And maybe fairies too.
...
INNER JOIN
Combines rows from one table and another based on matching column values
- condition for matching column is the join predicate
- commonly used!
1
2
3
SELECT title, name
FROM movie
INNER JOIN genre on movie.genre_id = genre.genre_id;
Result
1
2
3
4
5
6
7
8
9
10
11
title | name
-------------------------------------+-----------------
Dune | Science Fiction
Point Break | Thriller
Strange Days | Science Fiction
2046 | Drama
Hellboy | Super Hero
Los Abrazos Rotos (Broken Embraces) | Thriller
Blade Runner 2049 | Science Fiction
Wonder Woman | Super Hero
Shape of Water | Fantasy
This can also be done with WHERE
clause (implicit join)
1
2
3
SELECT title, name
FROM movie, genre
WHERE movie.genre_id = genre.genre_id;
What happened to the movie
rows that had a null
genre_id
?
- they weren’t included in the results!
- that implies that when a column that can have a
null
value is used in the join predicate, some rows may be omitted from the query result
An INNER JOIN
where the join predicate involves equality can be written with a USING
clause
1
2
3
SELECT title, name
FROM movie
INNER JOIN genre USING (genre_id);
This is very risky, but a NATURAL JOIN
performs an inner join implicitly on matching column names
1
SELECT * FROM movie NATURAL JOIN genre;
OUTER JOINS
Same as inner, but include everything in the first (left) table: LEFT OUTER JOIN
1
SELECT * FROM movie LEFT OUTER JOIN genre ON movie.genre_id = genre.genre_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
movie_id | title | year | runtime | genre_id | genre_id | name | description
----------+----------------------------------------+------+---------+----------+----------+-----------------+--------------------------------
1 | Alphaville | 1965 | 99 | | | |
2 | La Montaña Sagrada (The Holy Mountain) | 1973 | 99 | | | |
3 | Dune | 1984 | 136 | 5 | 5 | Science Fiction | Robots and stuff.
4 | Point Break | 1991 | 122 | 7 | 7 | Thriller | Close. Your. Eyes.
5 | Strange Days | 1995 | 145 | 5 | 5 | Science Fiction | Robots and stuff.
6 | 2046 | 2004 | 122 | 2 | 2 | Drama | Such feels.
7 | Hellboy | 2004 | 122 | 6 | 6 | Super Hero | Mostly capes.
8 | Los Abrazos Rotos (Broken Embraces) | 2009 | 128 | 7 | 7 | Thriller | Close. Your. Eyes.
9 | Blade Runner 2049 | 2017 | 163 | 5 | 5 | Science Fiction | Robots and stuff.
10 | Wonder Woman | 2017 | 141 | 6 | 6 | Super Hero | Mostly capes.
11 | Shape of Water | 2018 | 123 | 3 | 3 | Fantasy | Swords. And maybe fairies too.
(11 rows)
Same as above, but include everything in second (right) table: RIGHT OUTER JOIN
1
SELECT * FROM movie RIGHT OUTER JOIN genre ON movie.genre_id = genre.genre_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
movie_id | title | year | runtime | genre_id | genre_id | name | description
----------+-------------------------------------+------+---------+----------+----------+-----------------+-------------------------------------------
3 | Dune | 1984 | 136 | 5 | 5 | Science Fiction | Robots and stuff.
4 | Point Break | 1991 | 122 | 7 | 7 | Thriller | Close. Your. Eyes.
5 | Strange Days | 1995 | 145 | 5 | 5 | Science Fiction | Robots and stuff.
6 | 2046 | 2004 | 122 | 2 | 2 | Drama | Such feels.
7 | Hellboy | 2004 | 122 | 6 | 6 | Super Hero | Mostly capes.
8 | Los Abrazos Rotos (Broken Embraces) | 2009 | 128 | 7 | 7 | Thriller | Close. Your. Eyes.
9 | Blade Runner 2049 | 2017 | 163 | 5 | 5 | Science Fiction | Robots and stuff.
10 | Wonder Woman | 2017 | 141 | 6 | 6 | Super Hero | Mostly capes.
11 | Shape of Water | 2018 | 123 | 3 | 3 | Fantasy | Swords. And maybe fairies too.
| | | | | 4 | Horror | Ghosts, goblins, and other spooky things.
| | | | | 1 | Comedy | LOL
Both! FULL OUTER JOIN
1
SELECT * FROM movie FULL OUTER JOIN genre ON movie.genre_id = genre.genre_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
movie_id | title | year | runtime | genre_id | genre_id | name | description
----------+----------------------------------------+------+---------+----------+----------+-----------------+-------------------------------------------
1 | Alphaville | 1965 | 99 | | | |
2 | La Montaña Sagrada (The Holy Mountain) | 1973 | 99 | | | |
3 | Dune | 1984 | 136 | 5 | 5 | Science Fiction | Robots and stuff.
4 | Point Break | 1991 | 122 | 7 | 7 | Thriller | Close. Your. Eyes.
5 | Strange Days | 1995 | 145 | 5 | 5 | Science Fiction | Robots and stuff.
6 | 2046 | 2004 | 122 | 2 | 2 | Drama | Such feels.
7 | Hellboy | 2004 | 122 | 6 | 6 | Super Hero | Mostly capes.
8 | Los Abrazos Rotos (Broken Embraces) | 2009 | 128 | 7 | 7 | Thriller | Close. Your. Eyes.
9 | Blade Runner 2049 | 2017 | 163 | 5 | 5 | Science Fiction | Robots and stuff.
10 | Wonder Woman | 2017 | 141 | 6 | 6 | Super Hero | Mostly capes.
11 | Shape of Water | 2018 | 123 | 3 | 3 | Fantasy | Swords. And maybe fairies too.
| | | | | 4 | Horror | Ghosts, goblins, and other spooky things.
| | | | | 1 | Comedy | LOL
(13 rows)
SELF JOINS
Which movies were made in the same year, shown as pairs?
This is a tough one to answer, as we’d have to compare the table to itself to see if years are equal.
- use the table to join on itself
- do this with an inner join
- the inner join should be on year…
- so that we only get rows with years that match each other
Self Join Examples
1
2
3
4
SELECT a.title, b.title, a.year, b.year
FROM movie as a
INNER JOIN movie as b on a.year = b.year
ORDER BY a.year, a.title, b.title;
1
2
3
4
5
6
7
8
title | title | year | year
---------------------------------+----------------------------------------+------+------
Alphaville | Alphaville | 1965 | 1965
.
.
Strange Days | Strange Days | 1995 | 1995
2046 | 2046 | 2004 | 2004
2046 | Hellboy | 2004 | 2004
Unfortunately, our initial attempt didn’t yield the desired results. Matching on the ‘year’ criterion led to including movies that matched themselves in both tables.
What could we do to fix this?
- let’s try filtering the rows with a
WHERE
clause - this clause should only show rows that don’t have the same id
- so that we don’t the same title in both title columns
WHERE a.movie_id <> b.movie_id
1
2
3
4
SELECT a.movie_id, a.title, b.movie_id, b.title, a.year
FROM movie as a INNER JOIN movie as b ON a.year = b.year
WHERE a.movie_id != b.movie_id
ORDER BY a.year, a.movie_id, b.movie_id;
1
2
3
4
5
6
movie_id | title | movie_id | title | year
----------+-------------------+----------+-------------------+------
6 | 2046 | 7 | Hellboy | 2004
7 | Hellboy | 6 | 2046 | 2004
9 | Blade Runner 2049 | 10 | Wonder Woman | 2017
10 | Wonder Woman | 9 | Blade Runner 2049 | 2017
The previous attempt got rid rows with the same title in both a
and b
, but now we have twice as many rows since the duplicate rows have the titles switching columns. What now?
- because we have pairs of movies, we can use the ordering of
movie_id
to get rid of duplicate, but swapped title, rows - so, now we can filter by saying: only give me rows where one movie_id is less than the other (or greater than… just not both)
1
2
3
4
SELECT a.movie_id, a.title, b.movie_id, b.title, a.year
FROM movie as a INNER JOIN movie as b ON a.year = b.year
WHERE a.movie_id < b.movie_id
ORDER BY a.year, a.movie_id, b.movie_id;
1
2
3
4
5
movie_id | title | movie_id | title | year
----------+-------------------+----------+--------------+------
6 | 2046 | 7 | Hellboy | 2004
9 | Blade Runner 2049 | 10 | Wonder Woman | 2017
(2 rows)
Table Relationships
The possible cardinalities of rows in related tables are:
- one-to-many
- many-to-many
- one-to-one
One To Many
If you have a foreign key in your table, it will be the many side in a one-to-many relationship.
- rows in
movie
can reference the samegenre_id
- (both Dune and a Quiet Place have 57 listed as the related genre id)
- … meaning one
genre
can have manymovie
s
One To One
Making sure that there’s a one-to-one (or more exactly one to… zero or one) is a bit tricky
- if we put a unique constraint on the foreign key, that means a value for genre id can only occur once in
movie
- this means there can be either no associated profile or only associated profile for a one-to-one relationship.
- if you want exactly one-to-one, you can put an fk in each table and use deferred constraints in a transaction to insert rows in each table
Many To Many
What if we aimed for a scenario where movies could have multiple genres, and genres could encompass multiple movies?
For instance, ‘Dune’ could be classified as both Sci-Fi and Drama, while Sci-Fi is a genre shared by both ‘Dune’ and ‘A Quiet Place.’ One potential approach might involve having a ‘movie’ table with a ‘genre_id’ column, and a ‘genre’ table with a ‘movie_id’ column.
however, upon closer examination, this doesn’t seem to be the optimal solution! Why?
Because this approach would necessitate placing duplicate rows in one of the tables.
For many-to-many, create a third table that houses ids of both other tables (so only ids are duplicated)
- now, Dune has 2 genres and Drama has 2 movies
- why not just set an fk in each table? that would lead to duplicate rows in either table!
- sometimes this is called a join table; other names include association and junction table
- this join table can have additional columns, but minimally, the pk of both other tables