Post

Joins

The foundational knowledge stems from the NYU’s “CSCI-UA 479 Data Management and Analysis” course by Joe Versoza.

Joins

Joins are a relational algebra operation executed in SQL, involving the merging of columns from one or more tables.

Generate Table

1
2
3
4
5
6
CREATE TABLE genre (
	genre_id integer,
	name varchar(20),
	description text,
	PRIMARY KEY (genre_id)
);

Genre Data

1
2
3
4
5
6
7
8
9
 genre_id |      name       |                description
----------+-----------------+-------------------------------------------
        1 | Comedy          | LOL
        2 | Drama           | Such feels.
        3 | Fantasy         | Swords. And maybe fairies too.
        4 | Horror          | Ghosts, goblins, and other spooky things.
        5 | Science Fiction | Robots and stuff.
        6 | Super Hero      | Mostly capes.
        7 | Thriller        | Close. Your. Eyes.

Movie Table

1
2
3
4
5
6
7
8
CREATE TABLE movie (
	movie_id serial,
	title varchar(50) NOT NULL,
	year smallint NOT NULL,
	runtime smallint NOT NULL, 
	genre_id integer REFERENCES genre (genre_id),
	PRIMARY KEY (movie_id)
);

Movie Data

1
2
3
4
5
6
7
8
9
10
11
12
13
movie_id |                 title                  | year | runtime | genre_id
----------+----------------------------------------+------+---------+----------
        1 | Alphaville                             | 1965 |      99 |
        2 | La Montaña Sagrada (The Holy Mountain) | 1973 |      99 |
        3 | Dune                                   | 1984 |     136 |        5
        4 | Point Break                            | 1991 |     122 |        7
        5 | Strange Days                           | 1995 |     145 |        5
        6 | 2046                                   | 2004 |     122 |        2
        7 | Hellboy                                | 2004 |     122 |        6
        8 | Los Abrazos Rotos (Broken Embraces)    | 2009 |     128 |        7
        9 | Blade Runner 2049                      | 2017 |     163 |        5
       10 | Wonder Woman                           | 2017 |     141 |        6
       11 | Shape of Water                         | 2018 |     123 |        3

CROSS JOIN

Cartesian product

  • all rows from one table combined with each row from another table
  • results in n * m rows
1
SELECT * FROM movie CROSS JOIN genre;

Result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 movie_id |                 title                  | year | runtime | genre_id | genre_id |      name       |                description                
----------+----------------------------------------+------+---------+----------+----------+-----------------+-------------------------------------------
        1 | Alphaville                             | 1965 |      99 |          |        1 | Comedy          | LOL
        2 | La Montaña Sagrada (The Holy Mountain) | 1973 |      99 |          |        1 | Comedy          | LOL
        3 | Dune                                   | 1984 |     136 |        5 |        1 | Comedy          | LOL
        4 | Point Break                            | 1991 |     122 |        7 |        1 | Comedy          | LOL
        5 | Strange Days                           | 1995 |     145 |        5 |        1 | Comedy          | LOL
        6 | 2046                                   | 2004 |     122 |        2 |        1 | Comedy          | LOL
        7 | Hellboy                                | 2004 |     122 |        6 |        1 | Comedy          | LOL
        8 | Los Abrazos Rotos (Broken Embraces)    | 2009 |     128 |        7 |        1 | Comedy          | LOL
        9 | Blade Runner 2049                      | 2017 |     163 |        5 |        1 | Comedy          | LOL
       10 | Wonder Woman                           | 2017 |     141 |        6 |        1 | Comedy          | LOL
       11 | Shape of Water                         | 2018 |     123 |        3 |        1 | Comedy          | LOL
        1 | Alphaville                             | 1965 |      99 |          |        2 | Drama           | Such feels.
        2 | La Montaña Sagrada (The Holy Mountain) | 1973 |      99 |          |        2 | Drama           | Such feels.
        3 | Dune                                   | 1984 |     136 |        5 |        2 | Drama           | Such feels.
        4 | Point Break                            | 1991 |     122 |        7 |        2 | Drama           | Such feels.
        5 | Strange Days                           | 1995 |     145 |        5 |        2 | Drama           | Such feels.
        6 | 2046                                   | 2004 |     122 |        2 |        2 | Drama           | Such feels.
        7 | Hellboy                                | 2004 |     122 |        6 |        2 | Drama           | Such feels.
        8 | Los Abrazos Rotos (Broken Embraces)    | 2009 |     128 |        7 |        2 | Drama           | Such feels.
        9 | Blade Runner 2049                      | 2017 |     163 |        5 |        2 | Drama           | Such feels.
       10 | Wonder Woman                           | 2017 |     141 |        6 |        2 | Drama           | Such feels.
       11 | Shape of Water                         | 2018 |     123 |        3 |        2 | Drama           | Such feels.
        1 | Alphaville                             | 1965 |      99 |          |        3 | Fantasy         | Swords. And maybe fairies too.
        ...

INNER JOIN

Combines rows from one table and another based on matching column values

  • condition for matching column is the join predicate
  • commonly used!
1
2
3
SELECT title, name 
	FROM movie 
	INNER JOIN genre on movie.genre_id = genre.genre_id;

Result

1
2
3
4
5
6
7
8
9
10
11
                title                |      name       
-------------------------------------+-----------------
 Dune                                | Science Fiction
 Point Break                         | Thriller
 Strange Days                        | Science Fiction
 2046                                | Drama
 Hellboy                             | Super Hero
 Los Abrazos Rotos (Broken Embraces) | Thriller
 Blade Runner 2049                   | Science Fiction
 Wonder Woman                        | Super Hero
 Shape of Water                      | Fantasy

This can also be done with WHERE clause (implicit join)

1
2
3
SELECT title, name 
	FROM movie, genre 
	WHERE movie.genre_id = genre.genre_id;

What happened to the movie rows that had a null genre_id?

  • they weren’t included in the results!
  • that implies that when a column that can have a null value is used in the join predicate, some rows may be omitted from the query result

An INNER JOIN where the join predicate involves equality can be written with a USING clause

1
2
3
SELECT title, name 
	FROM movie 
	INNER JOIN genre USING (genre_id);

This is very risky, but a NATURAL JOIN performs an inner join implicitly on matching column names

1
SELECT * FROM movie NATURAL JOIN genre;

OUTER JOINS

Same as inner, but include everything in the first (left) table: LEFT OUTER JOIN

1
SELECT * FROM movie LEFT OUTER JOIN genre ON movie.genre_id = genre.genre_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
movie_id |                 title                  | year | runtime | genre_id | genre_id |      name       |          description           
----------+----------------------------------------+------+---------+----------+----------+-----------------+--------------------------------
        1 | Alphaville                             | 1965 |      99 |          |          |                 | 
        2 | La Montaña Sagrada (The Holy Mountain) | 1973 |      99 |          |          |                 | 
        3 | Dune                                   | 1984 |     136 |        5 |        5 | Science Fiction | Robots and stuff.
        4 | Point Break                            | 1991 |     122 |        7 |        7 | Thriller        | Close. Your. Eyes.
        5 | Strange Days                           | 1995 |     145 |        5 |        5 | Science Fiction | Robots and stuff.
        6 | 2046                                   | 2004 |     122 |        2 |        2 | Drama           | Such feels.
        7 | Hellboy                                | 2004 |     122 |        6 |        6 | Super Hero      | Mostly capes.
        8 | Los Abrazos Rotos (Broken Embraces)    | 2009 |     128 |        7 |        7 | Thriller        | Close. Your. Eyes.
        9 | Blade Runner 2049                      | 2017 |     163 |        5 |        5 | Science Fiction | Robots and stuff.
       10 | Wonder Woman                           | 2017 |     141 |        6 |        6 | Super Hero      | Mostly capes.
       11 | Shape of Water                         | 2018 |     123 |        3 |        3 | Fantasy         | Swords. And maybe fairies too.
(11 rows)

Same as above, but include everything in second (right) table: RIGHT OUTER JOIN

1
SELECT * FROM movie RIGHT OUTER JOIN genre ON movie.genre_id = genre.genre_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
 movie_id |                title                | year | runtime | genre_id | genre_id |      name       |                description                
----------+-------------------------------------+------+---------+----------+----------+-----------------+-------------------------------------------
        3 | Dune                                | 1984 |     136 |        5 |        5 | Science Fiction | Robots and stuff.
        4 | Point Break                         | 1991 |     122 |        7 |        7 | Thriller        | Close. Your. Eyes.
        5 | Strange Days                        | 1995 |     145 |        5 |        5 | Science Fiction | Robots and stuff.
        6 | 2046                                | 2004 |     122 |        2 |        2 | Drama           | Such feels.
        7 | Hellboy                             | 2004 |     122 |        6 |        6 | Super Hero      | Mostly capes.
        8 | Los Abrazos Rotos (Broken Embraces) | 2009 |     128 |        7 |        7 | Thriller        | Close. Your. Eyes.
        9 | Blade Runner 2049                   | 2017 |     163 |        5 |        5 | Science Fiction | Robots and stuff.
       10 | Wonder Woman                        | 2017 |     141 |        6 |        6 | Super Hero      | Mostly capes.
       11 | Shape of Water                      | 2018 |     123 |        3 |        3 | Fantasy         | Swords. And maybe fairies too.
          |                                     |      |         |          |        4 | Horror          | Ghosts, goblins, and other spooky things.
          |                                     |      |         |          |        1 | Comedy          | LOL

Both! FULL OUTER JOIN

1
SELECT * FROM movie FULL OUTER JOIN genre ON movie.genre_id = genre.genre_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
 movie_id |                 title                  | year | runtime | genre_id | genre_id |      name       |                description                
----------+----------------------------------------+------+---------+----------+----------+-----------------+-------------------------------------------
        1 | Alphaville                             | 1965 |      99 |          |          |                 | 
        2 | La Montaña Sagrada (The Holy Mountain) | 1973 |      99 |          |          |                 | 
        3 | Dune                                   | 1984 |     136 |        5 |        5 | Science Fiction | Robots and stuff.
        4 | Point Break                            | 1991 |     122 |        7 |        7 | Thriller        | Close. Your. Eyes.
        5 | Strange Days                           | 1995 |     145 |        5 |        5 | Science Fiction | Robots and stuff.
        6 | 2046                                   | 2004 |     122 |        2 |        2 | Drama           | Such feels.
        7 | Hellboy                                | 2004 |     122 |        6 |        6 | Super Hero      | Mostly capes.
        8 | Los Abrazos Rotos (Broken Embraces)    | 2009 |     128 |        7 |        7 | Thriller        | Close. Your. Eyes.
        9 | Blade Runner 2049                      | 2017 |     163 |        5 |        5 | Science Fiction | Robots and stuff.
       10 | Wonder Woman                           | 2017 |     141 |        6 |        6 | Super Hero      | Mostly capes.
       11 | Shape of Water                         | 2018 |     123 |        3 |        3 | Fantasy         | Swords. And maybe fairies too.
          |                                        |      |         |          |        4 | Horror          | Ghosts, goblins, and other spooky things.
          |                                        |      |         |          |        1 | Comedy          | LOL
(13 rows)

SELF JOINS

Which movies were made in the same year, shown as pairs?

This is a tough one to answer, as we’d have to compare the table to itself to see if years are equal.

  1. use the table to join on itself
  2. do this with an inner join
  3. the inner join should be on year…
  4. so that we only get rows with years that match each other

Self Join Examples

1
2
3
4
SELECT a.title, b.title, a.year, b.year 
FROM movie as a 
INNER JOIN movie as b on a.year = b.year 
ORDER BY a.year, a.title, b.title;
1
2
3
4
5
6
7
8
title           | title          | year | year
---------------------------------+----------------------------------------+------+------
 Alphaville     | Alphaville     | 1965 | 1965
 .
 .
 Strange Days   | Strange Days   | 1995 | 1995
 2046           | 2046           | 2004 | 2004
 2046           | Hellboy        | 2004 | 2004

Unfortunately, our initial attempt didn’t yield the desired results. Matching on the ‘year’ criterion led to including movies that matched themselves in both tables.

What could we do to fix this?

  • let’s try filtering the rows with a WHERE clause
  • this clause should only show rows that don’t have the same id
  • so that we don’t the same title in both title columns
  • WHERE a.movie_id <> b.movie_id
1
2
3
4
SELECT a.movie_id, a.title, b.movie_id, b.title, a.year
FROM movie as a INNER JOIN movie as b ON a.year = b.year
WHERE a.movie_id != b.movie_id
ORDER BY a.year, a.movie_id, b.movie_id;
1
2
3
4
5
6
 movie_id |       title       | movie_id |       title       | year
----------+-------------------+----------+-------------------+------
        6 | 2046              |        7 | Hellboy           | 2004
        7 | Hellboy           |        6 | 2046              | 2004
        9 | Blade Runner 2049 |       10 | Wonder Woman      | 2017
       10 | Wonder Woman      |        9 | Blade Runner 2049 | 2017

The previous attempt got rid rows with the same title in both a and b, but now we have twice as many rows since the duplicate rows have the titles switching columns. What now?

  • because we have pairs of movies, we can use the ordering of movie_id to get rid of duplicate, but swapped title, rows
  • so, now we can filter by saying: only give me rows where one movie_id is less than the other (or greater than… just not both)
1
2
3
4
SELECT a.movie_id, a.title, b.movie_id, b.title, a.year
FROM movie as a INNER JOIN movie as b ON a.year = b.year
WHERE a.movie_id < b.movie_id
ORDER BY a.year, a.movie_id, b.movie_id;
1
2
3
4
5
 movie_id |       title       | movie_id |    title     | year 
----------+-------------------+----------+--------------+------
        6 | 2046              |        7 | Hellboy      | 2004
        9 | Blade Runner 2049 |       10 | Wonder Woman | 2017
(2 rows)

Table Relationships

The possible cardinalities of rows in related tables are:

  1. one-to-many
  2. many-to-many
  3. one-to-one

One To Many

If you have a foreign key in your table, it will be the many side in a one-to-many relationship.

1

  • rows in movie can reference the same genre_id
  • (both Dune and a Quiet Place have 57 listed as the related genre id)
  • … meaning one genre can have many movies

One To One

Making sure that there’s a one-to-one (or more exactly one to… zero or one) is a bit tricky

2

  • if we put a unique constraint on the foreign key, that means a value for genre id can only occur once in movie
  • this means there can be either no associated profile or only associated profile for a one-to-one relationship.
  • if you want exactly one-to-one, you can put an fk in each table and use deferred constraints in a transaction to insert rows in each table

Many To Many

What if we aimed for a scenario where movies could have multiple genres, and genres could encompass multiple movies?

For instance, ‘Dune’ could be classified as both Sci-Fi and Drama, while Sci-Fi is a genre shared by both ‘Dune’ and ‘A Quiet Place.’ One potential approach might involve having a ‘movie’ table with a ‘genre_id’ column, and a ‘genre’ table with a ‘movie_id’ column.

however, upon closer examination, this doesn’t seem to be the optimal solution! Why?

Because this approach would necessitate placing duplicate rows in one of the tables.

3

For many-to-many, create a third table that houses ids of both other tables (so only ids are duplicated)

4

  • now, Dune has 2 genres and Drama has 2 movies
  • why not just set an fk in each table? that would lead to duplicate rows in either table!
  • sometimes this is called a join table; other names include association and junction table
  • this join table can have additional columns, but minimally, the pk of both other tables