Writing data with SQL queries
In addition to uploading files, you can also write to Seafowl using standard
INSERT
, UPDATE
, DELETE
queries.
curl -XPOST -H "Authorization: Bearer some_password" http://localhost:8080/q -d@<<EOF
{"query": "INSERT INTO new_table (col_1, col_2) VALUES (1, 'one')"}
EOF
Seafowl also supports CREATE TABLE AS
statements. These can be useful to
transform existing Seafowl tables:
CREATE TABLE weather_by_month AS
SELECT date_trunc(timestamp, 'month') AS month,
AVG(temp_c_10 * 0.1) AS avg_temp
FROM cambridge_weather GROUP BY 1 ORDER BY 1 ASC
...or "freeze" external tables into Seafowl tables:
CREATE EXTERNAL TABLE data
STORED AS PARQUET
LOCATION 'https://parqueth-sample.s3.us-west-1.amazonaws.com/mainnet/transactions/dt=2021-07-01/blocks-0012738509-0012739509.parquet';
CREATE TABLE parqueth_sample AS SELECT * FROM staging.data;
Seafowl doesn't support primary keys, foreign keys or unique constraints. If you insert the same row multiple times, it's not going to get deduplicated or raise an error.
For more information on DML statements, see the reference.
Best practices
Batching INSERT
s
Every INSERT
in Seafowl makes at least one new
partition. This means that it's more efficient to write
to Seafowl using INSERT
statements with multiple rows:
INSERT INTO some_table (col_1, col_2) VALUES
(1, 'one'),
(2, 'two'),
(3, 'three')
Insertion order
It's best to insert data in the order of the key that you're going to be filtering on the most. This can make Seafowl utilize partition pruning to speed up queries by scanning through less data.