Import Data
The foundational knowledge stems from the NYU’s “CSCI-UA 479 Data Management and Analysis” course by Joe Versoza.
Importing Data
We can import data into table by running .sql script or importing files
The typical workflow for imports is:
- Creating a table bassed on the data that you’ll import
- potentially clean the data so that the import works well
- import a file with a
COPY
query or generateINSERT
statements in a.sql
file
Running SQL Scripts
Two ways to run .sql
scripts:
- in the
psql
client, use the\i
command:\i /path/to/stuff-to-import.sql
- When starting
psql
, a file that contains sql commands can be redirected to the client so that statements within it are run:psql someDatabaseName < /path/to/stuff-to-import.sql
In both cases, the .sql
file can contain any number of valid sql commands.
Importing data by running SQL scripts example
1
2
3
4
5
6
7
8
9
10
11
12
13
DROP TABLE IF EXISTS song;
CREATE TABLE song (
id serial PRIMARY KEY,
title varchar(100),
artist varchar(100),
danceability numeric
);
INSERT INTO song (title, artist, danceability)
VALUES
('Heartbeats', 'Jose Gonzalez', 0.01),
('Heartbeats', 'The Knife', 0.9),
('Lucid Dreams', 'Juice WRLD', 0.9)
Before running psql
: psql dbname < song.sql
While in psql
: \i songs.sql
COPY from csv
COPY
can be used to import data from a csv (this is not standard SQL)
1
2
3
COPY table_name
FROM filename
options
options
can be replaced by some combination of additional options that control how file should be imported
To import only specific columns, use parentheses after the table name.
1
2
3
COPY table_name (col1, col2, col3)
FROM filename
options
COPY options
- format of file:
text
,csv
orbinary
DELIMITER AS 'some char'
- specify delimiter (default is comma for csv)NULL AS 'null_string'
- determines what string to treat as null (default for csv is empty string)HEADER
- presence specifies that header is included in fileENCODING 'ENCODING_NAME'
- specify encoding of file being imported (ENCODING 'LATING'
), if not specified, client encoding is used (psql
will auto detect based on your locale settings, likely ‘UTF8’)QUOTE AS 'quote_character'
- specify quote character
COPY example
Assume that this is csv example
1
2
3
4
5
title,artist,danceability
Heartbeats,Jose Gonzalez,0.01
Heartbeats,The Knife,0.9
Lucid Dreams,Juice WRLD,0.9
Happy Birthday,N/A,0.9
and a table exists with appropriate types
1
2
3
4
id serial PRIMARY KEY,
title varchar(100),
artist varchar(100),
danceability numeric
1
2
3
4
-- 4 columns, but only 3 in csv
COPY song (title, artist, danceability)
FROM '/tmp/songs.csv'
csv HEADER NULL AS 'N/A';
Assume that a tab delimited file that contains all the columns needed
1
2
3
COPY student
FROM '/tmp/student.txt'
csv HEADER DELIMITER AS E'\t';
Additional information can be found in the COPY documentation.
Import JSON
- the target column of the import should be type
jsonb
see the docs jsonb
allows a column to store JSON data that can be keyed/indexed into using arrow syntax->
(col->'field_name'
)- the format of the data file being brought in should be a JSON object per line
1
2
3
4
CREATE TABLE foo (
id serial PRIMARY KEY,
data jsonb
);
1
2
3
{"name": "bar", "a": 12, "b": 14}
{"name": "qux", "a": 100, "b": 101}
{"name": "corge", "a": 98, "b": 99}
1
2
COPY foo (data) FROM '/tmp/data.json';
SELECT data->'a' FROM foo;
Types
When accessing a value with colName->'key'
, the returned type is still jsonb
.
Alternatively, ->>
can be used to return text.