Post

Import Data

The foundational knowledge stems from the NYU’s “CSCI-UA 479 Data Management and Analysis” course by Joe Versoza.

Importing Data

We can import data into table by running .sql script or importing files

The typical workflow for imports is:

  1. Creating a table bassed on the data that you’ll import
  2. potentially clean the data so that the import works well
  3. import a file with a COPY query or generate INSERT statements in a .sql file

Running SQL Scripts

Two ways to run .sql scripts:

  1. in the psql client, use the \i command:
    • \i /path/to/stuff-to-import.sql
  2. When starting psql, a file that contains sql commands can be redirected to the client so that statements within it are run:
    • psql someDatabaseName < /path/to/stuff-to-import.sql

In both cases, the .sql file can contain any number of valid sql commands.

Importing data by running SQL scripts example

1
2
3
4
5
6
7
8
9
10
11
12
13
DROP TABLE IF EXISTS song;
CREATE TABLE song (
	id serial PRIMARY KEY,
  	title varchar(100),
  	artist varchar(100),
  	danceability numeric
);

INSERT INTO song (title, artist, danceability)
	VALUES
		('Heartbeats', 'Jose Gonzalez', 0.01),
		('Heartbeats', 'The Knife', 0.9),
		('Lucid Dreams', 'Juice WRLD', 0.9)

Before running psql: psql dbname < song.sql

While in psql: \i songs.sql

COPY from csv

COPY can be used to import data from a csv (this is not standard SQL)

1
2
3
COPY table_name
	FROM filename
	options

options can be replaced by some combination of additional options that control how file should be imported

To import only specific columns, use parentheses after the table name.

1
2
3
COPY table_name (col1, col2, col3)
	FROM filename
	options

COPY options

  • format of file: text, csv or binary
  • DELIMITER AS 'some char' - specify delimiter (default is comma for csv)
  • NULL AS 'null_string' - determines what string to treat as null (default for csv is empty string)
  • HEADER - presence specifies that header is included in file
  • ENCODING 'ENCODING_NAME' - specify encoding of file being imported (ENCODING 'LATING'), if not specified, client encoding is used (psql will auto detect based on your locale settings, likely ‘UTF8’)
  • QUOTE AS 'quote_character' - specify quote character

COPY example

Assume that this is csv example

1
2
3
4
5
title,artist,danceability
Heartbeats,Jose Gonzalez,0.01
Heartbeats,The Knife,0.9
Lucid Dreams,Juice WRLD,0.9
Happy Birthday,N/A,0.9

and a table exists with appropriate types

1
2
3
4
id serial PRIMARY KEY,
title varchar(100),
artist varchar(100),
danceability numeric
1
2
3
4
-- 4 columns, but only 3 in csv
COPY song (title, artist, danceability)
	FROM '/tmp/songs.csv'
	csv HEADER NULL AS 'N/A';

Assume that a tab delimited file that contains all the columns needed

1
2
3
COPY student
	FROM '/tmp/student.txt'
	csv HEADER DELIMITER AS E'\t';

Additional information can be found in the COPY documentation.

Import JSON

  • the target column of the import should be type jsonb see the docs
  • jsonb allows a column to store JSON data that can be keyed/indexed into using arrow syntax -> (col->'field_name')
  • the format of the data file being brought in should be a JSON object per line
1
2
3
4
CREATE TABLE foo (
	id serial PRIMARY KEY,
  	data jsonb
);
1
2
3
{"name": "bar", "a": 12, "b": 14}
{"name": "qux", "a": 100, "b": 101}
{"name": "corge", "a": 98, "b": 99}
1
2
COPY foo (data) FROM '/tmp/data.json';
SELECT data->'a' FROM foo;

Types

When accessing a value with colName->'key', the returned type is still jsonb.

Alternatively, ->> can be used to return text.